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(57) Abstract 



A data structure, tangibly embodied in a computer-readable medium, representing a polymer of chemical units is disclosed. The 
data structure includes an identifier including a plurality of fields for storing values corresponding to properties of the polymer. In one 
embodiment, the fields are capable of storing binary values. The polymer may, for example, be a polysaccharide and the chemical units may 
be saccharides. A computer-implemented method for determining whether properties of a query sequence of chemical units match properties 
or a polymer of chemical units. The query sequence is represented by a first data structure, tangibly embodied in a computer-readable 
medium, including an identifier including a plurality of bit fields for storing values corresponding to properties of the query sequence. The 
polymer is ^represented by a second data structure, tangibly embodied in a computer-readable medium, including an identifier including a 
plurality of bit fields for storing values corresponding to properties of the polymer. The invention also relates to methods of sequencing 
polymers such as nucleic acids, polypeptides and polysaccharides and methods for identifying a polysaccharide-protein interaction. The 
invention also involves a notational system referred to as Property Encoded Nomenclature. 
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SYSTEM AND METHOD FOR NOTATING POLYMERS 



Background 

Various notational systems have been used to encode classes of chemical units. 

5 In such systems, a unique code is assigned to each chemical unit in the class. For 

example, in a conventional notational system for encoding amino acids, a single letter of 
the alphabet is assigned to each known amino acid. A polymer of chemical units can be 
represented, using such a notational system, as a set of codes corresponding to the 
chemical units. Such notational systems have been used to encode polymers, such as 

10 proteins, in a computer-readable format. A polymer that has been represented in a 

computer-readable format according to such a notational system can be processed by a 
computer. 

Conventional notational schemes for representing chemical units have 
represented the chemical units as characters (e.g., A, T, G, and C for nucleic acids), and 

15 have represented polymers of chemical units as sequences or sets of characters. Various 
operations may be performed on such a notational representation of a chemical unit or a 
polymer comprised of chemical units. For example, a user may search a database of 
chemical units for a query sequence of chemical units. The user typically provides a 
character-based notational representation of the sequence in the form of a sequence of 

20 characters, which is compared against the character-based notational representations of 
sequences of chemical units stored in the database. Character-based searching 
algorithms, however, are typically slow because such algorithms search by comparing 
individual characters in the query sequence against individual characters in the sequences 
of chemical units stored in the database. The speed of such algorithms is therefore 

25 related to the length of the query sequence, resulting in particularly poor performance for 
long query sequences. 

Summary 

Polymers may be characterized by identifying properties of the polymers and 
30 comparing those properties to reference polymers, a process referred to herein as 

property encoded nomenclature (PEN). In one embodiment, the properties are encoded 
using a binary notation system, and the comparison is accomplished by comparing the 
binary representations of polymers. For instance, in one aspect a sample polymer is 
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subject ,o an experimental contain, I0 ^ me ^ ^ 

compared ,„ a reference database of polymers t0 identify , 
having a m te is , he m „ M .^^ (o a propCTy ^ ^ 

method may be repea,ed until the pop„,a,io„ of powers in the reference database is 
5 reduced to one and the identify of the sample polymer is known 

In one aspect the invention is directed to anotationa, system for represent 
polymers of chemical urn,, notationa, system is refcrred „ „ p ^ 
nomenCantre (PEN). According to one embodiment of the notationa, system, a po,ymer 
- asstgned an .dentif.er ma, includes information about properties of the p„,y m er For 

"* " *" emb ° dimen '- ° f * *«Wd. are each assigned a binary 

value, and an identifier for the disaccharide includes the binary vaiues assigned to me 
propert.es of the disaccharide. In one embodiment, me identifier is capable of being 
expressed as a number, such as a singre hexadecimal digit. Tte iienti6a may h ^ 
macornputerreadable medium, such as in a da. unit (e.g., record or table entry) of a 
^er d^. Polymer iden ffi ers may be used in a number of ways. Por example, 
the .denufers may be used to determine whether properties of a auery sequence of 
chemrca, units match properties of a polymer of chemical units. One application of such 

tor a polymer or polymers having specified properties. 

In one aspect the invention is directed ,„ a data structure, tangibly embodied in a 

*e mventton ts dtrected to a compu,er-i mp ,em ented method for generating such a dal 

fle,ds for stonng values corresponding to propel of me polyme, At .east one field 
-y be a no,characte,based field. Each f,e,d may be capab.e of storing a binary vahte 

smgle-dign hexadecimal number. 

Thepdyntermaybeanyofavarieryofpolyn.ers. For example, (,) the polymer 

be a nude, ac.d and the chemica. units may be mdeoti ^ or (3) ^ £ 
polypept.de and the chemical units may be amino acids. 

TT= properties may be properties of the chemical units in the polymer For 

example, the properties may include charo^ • ■ • . 

y 'uae charges of chemical umtsm the polymer, identities 



of chemical units in the polymer, confirmations of chemical units in the polymer, or 
identities of substituents of chemical units in the polymer. The properties may be 
properties of the polymer that are not properties of any individual chemical unit within 
the polymer. Example properties include a total charge of the polymer, a total number of 

5 sulfates of the polymer, a dye-binding of the polymer, a mass of the polymer, 

compositional ratios of substituents, compositional ratios of iduronic versus glucuronic, 
enzymatic sensitivity, degree of sulfation, charge, and chirality. 

In another aspect, the invention is directed to a computer-implemented method 
for determining whether properties of a query sequence of chemical units match 

10 properties of a polymer of chemical units. The query sequence may be represented by a 
first data structure, tangibly embodied in a computer-readable medium, including an 
identifier that may include one or more bit fields for storing values corresponding to 
properties of the query sequence. The polymer may be represented by a second data 
structure, tangibly embodied in a computer-readable medium, including an identifier that 

15 may include one or more bit fields for storing values corresponding to properties of the 
polymer. The method may include acts of generating at least one mask based on the 
values stored in the one or more bit fields of the first data structure, performing at least 
one binary operation on the values stored in the one or more bit fields of the second data 
structure using the at least one mask to generate at least one result, and determining 

20 whether the properties of the query sequence match the properties of the polymer based 
on the at least one result. The chemical units may, for example, be any of the chemical 
units described above. Similarly, the properties may be any of the properties described 
above. 

In one embodiment, the act of generating includes an act of generating the at least 
25 one mask as a sequence of bits that is equivalent to the values stored in the one or more 
bit fields of the first data structure. In another embodiment, the act of generating 
includes an act of generatinglhe.at least jcmejnastasa sequential repetitioiuof the values 
stored in the one or more bit fields of the first data structure. 

In a further embodiment, the at least one mask includes a plurality of masks and 
30 the act of performing at least one binary operation includes acts of performing a logical 
AND operation on the values stored in the one or more bit fields of the second data 
structure using each of the plurality of masks to generate a plurality of intermediate 
results, and combining the plurality of intermediate results using at least one logical OR 
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operation ,o generate me at least one ^ fc ^ ^ ^ ^ ^ 

mCudes an ac. of determining u, at properties of the query sequ=nce ^ 
properties of the polymer when the a, ieast one result has a non-.ro value . In , ^ 
embodunem, the a. leas, one binary operation includes a, leas, one logical AND 

5 operation. 

In another aspeet, the invention is directed to a database, tangibly embodied in a 
computer-readable medium, ,0r storing infection descriptive of one or more polymers 
The database may include one or more data units (e.g., records or table entries) 
corresponding to the one or more polymers, each of the data units may inoIude „ 
>0 tdentifter ma, may include one or more fields for storing values corresponding to 
properties of the polymer. 

In another embodiment, the invention is directed to a data structure, tangibly 
embodied in a computer-readable medium, representing a chemica! unit ofapolymer 
The data structure may comprise an identifier including one or more field, Each field 
5 may be for storing a value corresponding ,o one or more properties of the chemical unit 
At leas, one field may store a non-character-based value such as, for example, a binary Or 
decimal value. 

Polymers may be characterized by identifying properties of the polymers and 
companng those properties to reference polymers, a process referred to herein as 
> properryencodednomenclaturefPEN). In one embodiment, me properties are encoded 
ustng a bina^ notation system, and the comparison is accomplished by comparing the 
buy representations of p„ ly mers. For instance, in one aspect a sample polymer is 
subjected to an experimental constrain, to modify the polymer, the modified polymer is 
compared to a reference database of polymers to identify a population of polymers 

method may be repeated until the population of polymers in tire reference datebase is 
reduced to one and the identity of the sample polymer is known 

In a system including a da,abase of properties of polymers of chemical units a 

Known molecular weight and leneth i<s nm™^ „„ j- 

Th, ™* a- , , t P fdmg 10 ° ne of me ^ntion. 

l ne method includes the steps of 

(A) selecting, from me datitbase, candidate polymers of chemical unks having 
•he same lengm as ,he sample polymer of chemical unite and having 



molecular weights similar to the molecular weight of the sample polymer 
of chemical units; 

(B) performing an experiment on the sample polymer of chemical units; 

(C) measuring properties of the sample polymer of chemical units resulting 
from the experiment; and 

(D) eliminating, from the candidate polymers of chemical units, polymers of 
chemical units having properties that do not correspond to the 
experimental results. 

In some embodiments the method also includes the step of: 

(E) repeatedly performing the step (D) until the number of candidate 
polymers of chemical units falls below a predetermined threshold. 

In other aspects the invention is a method for identifying a population of 
polymers of chemical units having the same property as a sample polymer of chemical 
units. The method includes the steps of determining a property of a sample polymer of 
chemical units, and comparing the property of the sample polymer to a reference 
database of polymers of known sequence and known properties to identify a population 
of polymers of chemical units having the same property as a sample polymer of chemical 
units, wherein the reference database of polymers includes identifiers corresponding to 
the chemical units of the polymers, each of the identifiers including a field storing a 
value corresponding to the property. 

In one embodiment the step of determining a property of the sample polymer 
involves the use of mass spectrometry, such as for example, matrix assisted laser 
desorption ionization mass spectrometry (MALDI-MS), electron spray-MS, fast atom 
bombardment mass spectrometry (FAB-MS) and collision-activated dissociation mass 
spectrometry (CAD) to determine the molecular weight of the polymer. MALDI-MS, 
for instance, may be used to determine the molecular weight of the polymer with an 
accuracy of approximately one JDalton. . . 

The step of identifying a property of the polymer in other embodiments may 
involve the reduction in size of the polymer into pieces of several units in length that 
may be detected by strong ion exchange chromatography. The fragments of the polymer 
may be compared to the reference database polymers. 

According to other aspects, the invention is a method for identifying a 
subpopulation of polymers having a property in common with a sample polymer of 
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chemical uni.s. The memod invoives t »e s,e P s of a PP , ying „ 

. . olymer ,o „, odify fte P olymer, d e,ec,i„ g a property of ue modjfied 

as .he sam pl e poiymer, and id e„ tifyi „ g a subpopulation of ^ 

_ w„ h ft e .vice mo.ified P olymer. Each of .he ae P s may u,e„ be re P ea,H" 

preOeterrmned threshold of poIymers ^ ^ & ^ 

In ye. ano,her as P ec, a. invennon is a mahod for identify a sub P o P „la,ion of 
**»««™« *™ in common wi,h a sample polymer of chemical J.. The 

z^r "* steps of appiying 30 txperiInen,a, ~ » <* * 

mo d fy mepolymer, de«eci„ g a firs, proper* of the modified polymer, ide ntifymg . 

Polymer, and identify^ a s„bpop„,a,i„„ of identified popillatioI1 of * 
polymers havin g to same flrsl property K me ^ 

alto the polymer » snch a manner ma. i, wiU be possible ,o derive stmctoral 
mfonnahon abou, the ^ „ mwk ^ ^ ^ ^ 

co„ enzymaac di g esU„r, e.g„ wim an coenzyme, an endoenzyme, a resttZ 
compound; cherruca, pee.in g (i.,, removal of a monosaccharide uni.) ; and enzymatfc 



may be the molecular weight or length of the polymer. In other embodiments the 
property may be the compositional ratios of substituents or units, type of basic building 
block of a polysaccharide, hydrophobicity, enzymatic sensitivity, hydrophilicity, 
secondary structure and conformation (i.e., position of helices), spatial distribution of 
5 substituents, ratio of one set of modifications to another set of modifications (i.e., relative 
amounts of 2-0 sulfation to N-sulfation or ratio of iduronic acid to glucuronic acid, and 
binding sites for proteins. 

The properties of the modified polymer may be detected in any manner possible 
which depends on the property and polymer being analyzed. In one embodiment the step 
10 of detection involves mass spectrometry such as matrix assisted laser desorption 

ionization mass spectrometry (MALDI-MS), electron spray MS, fast atom bombardment 
mass spectrometry (FAB-MS) and collision-activated dissociation mass spectrometry 
(CAD). Alternatively, the step of detection involves strong ion exchange 
chromatography, for example, if the polymer has been digested into several smaller 
1 5 fragments composed of several units each. 

The method is based on a comparison of the sample polymer with a population of 
polymers of the same length or having at least one property in common. In some 
embodiments the population of polymers of chemical units includes every polymer 
sequence having the molecular weight of the sample polymer. In other embodiments the 
20 population of polymers of chemical units includes less than every polymer sequence 
having the molecular weight of the sample polymer. According to some embodiments 
the step of identifying includes selecting the population of polymers of chemical units 
from a database including molecular weights of polymers of chemical units. Preferably 
the database includes identifiers corresponding to chemical units of a plurality of 
25 polymers, each of the identifiers including a field storing a value corresponding to a 
property of the corresponding chemical unit. 

- Aocording-to another-aspect of the invention a method -for compositional analysis 
of a sample polymer is provided. The method includes the steps of applying an 
experimental constraint to the sample polymer to modify the sample polymer, detecting a 
30 property of the modified sample polymer, and comparing the modified sample polymer 
to a reference database of polymers of identical size as the polymer, wherein the 
polymers of the reference database have also been subjected to the same experimental 



25 



30 
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conaraim as the sample polymer, wherein me comparison provides a compositional 
analysis of the sample polymer. 

In some embodiments the compositional analysis reveals the number and type of 
urats within the polymer. In other embodiments the compositional analysis reveals the 
5 identity of a sequence of chemical units of the polymer. 

Similarly to the aspects of the invention described above the properties of the 
polymer may be detected in any manner possible and will depend on the particular 
property and polymer being analyzed. In one embodiment the step of detection involves 
mass spectrometry such as matrix assisted laser desorption ionization mass spectrometry 
) (MALDI-MS), electron spray MS, fast atom bombardment mass spectrometry (FAB- 
MS) and collision-activated dissociation mass spectrometry (CAD). Preferably the 
experimental constraint applied to the polymer is an enzymatic or chemical reaction 
which mvolves incomplete enzymatic digestion of the polymer and wherein the steps of 
the method are repeated until the number of polymers within the reference database falls 
below a predetermined threshold. Alternatively, the step of detection involves capillary 
electrophoresis, particularly when the experimental constraint applied to the polymer 
involves complete degradation of the polymer into individual chemical units. 

In one embodiment the reference database includes identifiers corresponding to 
chermcal units of a plurality of polymers, each of the identifiers including a field storing 
a value corresponding to a property of the corresponding chemical unit. 

According to yet another aspect of the invention a method for sequencing a 
polymer is provided. The method includes the steps of applying an experimental 
constraint to the polymer to modify the polymer, detecting a property of themodified 
polymer, rientifying a population of polymers having the same molecular length as the 
sample polymer and having molecular weights similar to the molecular weight of the 
sample polymer, identifying a subpopulation of the identified population of polymers 
havmg the same property as the modified polymer by eliminating, from the identified 
population of polymers, polymers having properties that do not correspond to the 
modtfied polymer, and repeating the steps applying an experimental constraint, detecting 
a property and identifying a subpopulation by applying additional experimental 
constraints to the polymer and identifying additional subpopulations of polymers until 
the number of polymers within the subpopulation is one and the sequence of the polymer 
may be identified. 



-9- 



In another aspect the invention relates to a method for identifying a 
polysaccharide-protein interaction, by contacting a protein-coated MALDI surface with a 
polysaccharide containing sample to produce a polysaccharide-protein-coated MALDI 
surface, removing unbound polysaccharide from the polysaccharide-protein-coated 
5 MALDI surface, and performing MALDI mass spectrometry to identify the 

polysaccharide that specifically interacts with the protein coated on the MALDI surface. 

In one embodiment a MALDI matrix is added to the polysaccharide-protein- 
coated MALDI surface. In other embodiments an experimental constraint may be 
applied to the polysaccharide bound on the polysaccharide-protein-coated MALDI 
10 surface before performing the MALDI mass spectrometry analysis. The experimental 
constraint applied to the polymer in some embodiments is digestion with an exoenzyme 
or digestion with an endoenzyme. In other embodiments the experimental constraint 
applied to the polymer is selected from the group consisting of restriction endonuclease 
digestion; chemical digestion; chemical modification; and enzymatic modification. 

15 

Brief Description of the Drawings 

FIG. 1 is a block diagram illustrating an example of a computer system for 
storing and manipulating polymer information. 

FIG. 2A is a diagram illustrating an example of a record for storing information 
20 about a polymer and its constituent chemical units. 

FIG. 2B is a diagram illustrating an example of a record for storing information 
about a polymer. 

FIG. 2C is a diagram illustrating an example of a record for storing information 
about constituent chemical units of a polymer. 
25 FIG. 3 is a flow chart illustrating an example of a method for determining 

whether propCTiies'ofa"^ propeftreTdfaTsecond 
chemical unit. 

FIG. 4 is a dataflow diagram of a system for sequencing a polymer. 
FIG. 5 is a flow chart of a process for sequencing a polymer. 
30 FIG. 6 is a flow chart of a process for sequencing a polymer using a genetic 

algorithm. 
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FIG. 8 is a mass line diagram. 

HO. 9 is a mass-line diagram for (A) Polysialic Acid with NAN and (B) 
5 Polysialic Acid with NGN. 

FIG. I( ""8 ra ph(A)depic,i n8 cleavageb y Hepraofei th e r G ( .), I( 0)„ rW . ) 
■.nicages, and a graph (B) dicing same study as in A b„, where cleavage^ 

performed with Hep I. 

FIG. , , is a graph depicting MALDI-MS analysis of the extended core stru«ures 
TO. .2 ,s a graph depicting MALDI-MS analysis of the PSA pdysaccharide (A) 

(O D,ges, of M w*h galactosidase ^ ^ w / - 
acetyttexosaminidase from £ pmummiae , m Tabfe rf ^ 
schematic struetitre and theoretical molecular masse, [0] =ma MOS =; ^ 
TO- N-acetylglucosamine; gaiaaose; and M-N-acetylneura^c ^ Peaks 
-^»* - -«* « i-Puritie, and ,he analyte pea, is detected both as M-H 
(m/z 2369.5) and as a monosodiated adduct (M+Na-2H, m/z 2392 6) 

FIG. "isagraphdepictingmeresuhsofen^maticdegradationoffte 
sacchar.de chain directiy off of PSA . (A) ps A before the addition of exoenzymes <B> 
Treatment of (A) with sialidase results in a mass decr-ase o'2S7 r> 
i„«„ f . ,. . aecr - ase 01 287 Da, consistent with the 

oss of one s,ahc acd reside (C, Treatment of (B) with galactose. (D) Up „„ 

of (C, witi, hexosaminidase, a decrease of 393 Da indicates the loss oftwo N- 
acetylglucosamine residues. 

FIG. ,4 is a graph depicting the results of treatment of biamennary and 
mantennary saccharides with endog.yca.eF2. (A) Treatment of the biantetutary 
sacchande results „ a mass decrease of 348.6, indicating cleavage between the G.cNAc 

EnoRtreatmentofheatdenaturedPSA. There is a mass reduction of ,7097 DaM the 
molecular mass of PSA (compare B4C and Rla"i • a- • .. '^'Uamthe 
„_ „ „ 1 mparemL and B3a) mdicating that the normal glycan 

structure of PSA is biantennary SV 
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Detailed Description 

The invention relates in some aspects to methods for characterizing polymers to 
identify structural properties of the polymers, such as the charge, the nature and number 
of units of the polymer, the nature and number of chemical substituents on the units, and 

5 the stereospecificity of the polymer. The structural properties of polymers may provide 
useful information about the function of the polymer. For instance, the properties of the 
polymer may reveal the entire sequence of units of the polymer, which is useful for 
identifying the polymer. Similarly, if the sequence of the polymer was previously 
unknown, the structural properties of the polymer are useful for comparing the polymer 

10 to known polymers having known functions. The properties of the polymer may also 
reveal that a polymer has a net charge or has regions which are charged. This 
information is useful for identifying compounds that the polymer may interact with or 
predicting which regions of a polymer may be involved in a binding interaction or have a 
specific function. 

15 Many methods have been described in the prior art for identifying polymers and 

in particular for identifying the sequence of units of polymers. Once the sequence of a 
polymer is identified the sequence information is stored in a database and may be used to 
compare the polymer with other sequenced polymers. Databases such as GENBANK 
enable the storage and retrieval of information relating to the sequences of nucleic acids 

20 which have been identified by researchers all over the world. These databases typically 
store information using notational systems that encode classes of chemical units by 
assigning a unique code to each chemical unit in the class. For example, a conventional 
notational system for encoding amino acids assigns a single letter of the alphabet to each 
known amino acid. Such databases represent a polymer of chemical units using a set of 

25 codes corresponding to the chemical units. Searches of such databases have typically 
been performed using character-based comparison algorithms. 

New meftqds_ for identifying stnictural properties of polymers which can utilize 
Bioinformatics and which differ from the prior art methods of assigning a character to 
each unit of a polymer have been discovered. These methods are referred to as PEN 

30 (property encoded nomenclature). In one aspect, the invention is based on the 

identification and characterization of properties of a polymer, rather than units of the 
polymer, and the use of numeric identifiers to classify those properties and to facilitate 
information processing relating to the polymer. 
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1* ability t0 idemify „ s Qf po]ymers Md io ^ 
concenung the properties of me rayaa prov . de ^ aduantagra ^ ^ 

method, of characterizing polymers and Bioinformatics. For instance, the mahods of 
me mvention may be used to identify statural information ^ ^ 

5 polymers such as polysaccharides which were previously very difficu,, ,o amuyze using 
prior art methods. B 

^ The heterogeneity and the high degree of variability of the polysaccharide 
^Idtng blocks have hindered prior ar, attempts to sequence these complex molecules. 
Wtth the advent of extremdy sensitive techniques like High Pressure Liquid 
» autography (HPLC), Capillar E.ectrophoresis (CE) and Mass SpectromeUy (MS) 
to tsolate and characterize targe biomolecules, significant advances have been made in 
.so.at.ng and purifying polysaccharide fragments containing specific bu( 
ex.ens.ve experiment maniptUation is stiH required to identify and sequence 
..■formation. Additi „ MlIy , m raos , of ^ ^ rf 

sequence „ required in order to design the experimental manipulations tha, wil! enable 
.he sequencing of the po.ysaccharide. The methods of the p rior ar, provide simple „d 
rap.d methods for identifying sequence information. Many oth er advantages wil, be 
clear from the description of the preferred embodiments set forth below 

^P-»'ta-n fi o»wi.,bebe„eru„ders«oodinviewof m efol. 0 wingde a i I ed 
dtvir 3 emb ° dimEnt ^ iD COnjU " C,i0n * 

HO. 1 shows an example of a computer sys«m 100 for storing and manipulating 
PC ymermformation. The computer system ,00 indudes a pofymer database ,02 which 
«to a plurahfy of records ,04a-„ storing information corresponding to a phtrality of 
powers. Each of the records -04a-„ may store information about properties of fte 
umts o r bo*. ^ &r ^ ^ ^ 

may be any kind of polymers For examrie i • aoaselU2 

ror example, the polymers may include polysaccharides 
nucleic acids, or polypeptides. cnanaes, 

A "polymer" as used herein is a compound having a, bear and/or branched 
hackhone of chemical umts which are secured togemer by Hnkagcs. .„ some b„, no, a„ 
case the backbone of the pofymer may be bmM ^ ^ ..^^ 

usua.meamngmmefie.dofpofymerchemJsW.TT.e^ymersn.ybehe^^ 
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backbone composition thereby containing any possible combination of polymer units 
linked together such as peptide- nucleic acids. In some embodiments the polymers are 
homogeneous in backbone composition and are, for example, a nucleic acid, a 
polypeptide, a polysaccharide, a carbohydrate, a polyurethane, a polycarbonate, a 
5 polyurea, a polyethyleneimine, a polyarylene sulfide, a polysiloxane, a polyimide, a 
polyacetate, a polyamide, a polyester, or a polythioester. A "polysaccharide" is a 
biopolymer comprised of linked saccharide or sugar units. A "nucleic acid" as used 
herein is a biopolymer comprised of nucleotides, such as deoxyribose nucleic acid 
(DNA) or ribose nucleic acid (RN A). A polypeptide as used herein is a biopolymer 

1 0 comprised of linked amino acids. 

As used herein with respect to linked units of a polymer, "linked" or "linkage" 
means two entities are bound to one another by any physicochemical means. Any 
linkage known to those of ordinary skill in the art, covalent or non-covalent, is embraced. 
Such linkages are well known to those of ordinary skill in the art. Natural linkages, 

15 which are those ordinarily found in nature connecting the chemical units of a particular 
polymer, are most common. Natural linkages include, for instance, amide, ester and 
thioester linkages. The chemical units of a polymer analyzed by the methods of the 
invention may be linked, however, by synthetic or modified linkages. Polymers where 
the units are linked by covalent bonds will be most common but also include hydrogen 

20 bonded, etc. 

The polymer is made up of a plurality of chemical units. A "chemical unit" as 
used herein is a building block or monomer which can be linked directly or indirectly to 
other building blocks or monomers to form a polymer. The polymer preferably is a 
polymer of at least two different linked units. The particular type of unit will depend on 

25 the type of polymer. For instance DNA is a biopolymer comprised of a deoxyribose 
phosphate backbone composed of units of purines and pyrimidines such as adenine, 
cytosine, guanine, thymine, 5-methylcytosine, 2-aminopurine, 2-amino-6-chloropurine, 
2,6-diaminopurine, hypoxanthine, and other naturally and non-naturally occurring 
nucleobases, substituted and unsubstituted aromatic moieties. RNA is a biopolymer 

30 comprised of a ribose phosphate backbone composed of units of purines and pyrimidines 
such as those described for DNA but wherein uracil is substituted for thymidine. DNA 
units may be linked to the other units of the polymer by their 5' or 3' hydroxyl group 
thereby forming an ester linkage. RNA units may be linked to the other units of the 
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polymer by ,he ir y , J- , y hydroxyl group ^ ^ ^ ^ 

Whenever a nucleic acid is represented by a sequence of letters i, will be 

understood that the nucleotides are in 5'-> V n „i„ ^ , A . 

cs are in d j order from left to right and that "A" 

denotes adenosine, "C" denotes cvtiHinp <<r» a * " ' . ' ' 

and'Wden, ■, 8Uan ° S, " e - "^""tote thymidine, 

and U denotes uracil unless otherwise noted. 

The chemicaiunhs of a polypeptide are amino acids, indudingthe 20 naturally 
^*H^^ ^™no acids as well as modified amino acids, Amino acids may exist as 
anudes or free acids and are linked to the other units in the backbone of the polymers 
through the.ra-a.nino group thereby forming an amide linkage ,„ the polymer 

A polysaccharide is a polymer composed of monosaccharides linked ,„ one 
another. ,n many polysaccharides the basic building block of the polysaccharide is 
actually a ^saccharide .mi, which can be repeating or non-repeating. Thus, a unit when 

and « , mclude a monomeric bui.diug block (monosaccharide) or a dimeric building 
block (disaccharide). s 

A"plurality of chemical units" is a, leas, two units linked to one another 
The polymer, may be native or naurally-occumng pofymers which occur fa 
na^e or „o n -natura,,y occurring polymers which do no, exist in nature. The polymers 
really mdude a, ,east a poruon of a nattily occurring polymer. The po,yl can 
Be ,ola,ed or synthesis * „ mo . For examp|e> me ^ ^ ^ - 

na^ sources e., purified, as by cleavage ar,d ge, separata or may be synthesized 
e.g., .) amphfied in v/ fro by, for example, po.ymerase chain reaction (PCRV (ii) 

F * 2A °" '-ample of me forma, of a da,a uni, 200 in the polymer 

da a base,02(i.e.,oneof tt ,eda a um« 1 04a-„). As shown in FIG. 2 A, the d^ta unit 200 

HO ZZ T neP ° WlD202 ^^ ta -^Be,owJ„respl,o 
na 2B. The data unit 200 also may include one or more chemical uni, identifiers (.Ds) 

204,„con^ndfa g ,„ehen,calum Bm a,arecons tite »tsof ft ep„ ly mer 
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corresponding to the data unit 200. The chemical unit IDs 204a-« are described in more 
detail below with respect to FIG. 2C. The format of the data unit 200 shown in FIG. 2A 
is merely an example of a format that may be used to represent polymers in the polymer 
database 102. Polymers may be represented in the polymer database in other ways. For 
example, the data unit 200 may include only the polymer ID 202 or may only include 
one or more of the chemical unit IDs 204a-«. 

FIG. 2B illustrates an example of the polymer ID 202. The polymer ID 202 may 
include one or more fields 202a-« for storing information about properties of the polymer 
corresponding to the data unit 200 (FIG. 2A). Similarly, FIG. 2C illustrates an example 
of the chemical unit 204a. The chemical unit ID 204a may include one ore more fields 
206a-//2 for storing information about properties of the chemical unit corresponding to 
thexhemical unit ID 204a. Although the following description refers to the fields 206a- 
m of the chemical unit ID 204a, such description is equally applicable to the fields 202a- 
n of the polymer ID 202a (and the fields of the chemical unit IDs 204b-n). 

The fields 206a-w of the chemical unit ID 204a may store any kind of value that 
is capable of being stored in a computer readable medium, such as, for example, a binary 
value, a hexadecimal value, an integral decimal value, or a floating point value. 

Each field 206a-m may store information about any property of the corresponding 
chemical unit. Thus, the invention is useful for identifying properties of polymers. A 
"property" as used herein is a characteristic (e.g., structural characteristic) of the polymer 
that provides information (e.g., structural information) about the polymer. When the 
term property is used with respect to any polymer except a polysaccharide the property 
provides information other than the identity of a unit of the polymer or the polymer 
itself. A compilation of several properties of a polymer may provide sufficient 
information to identify a chemical unit or even the entire polymer but the property of the 
polymer itself does not encompass the chemical basis of the chemical unit or polymer. 

When the- term property is used with respect to polysaeeharides r to define a 
polysaccharide property, it has the same meaning as described above except that due to 
the complexity of the polysaccharide, a property may identify a type of monomelic 
building block of the polysaccharide. Chemical units of polysaccharides are much more 
complex than chemical units of other polymers, such as nucleic acids and polypeptides. 
The polysaccharide unit has more variables in addition to its basic chemical structure 
than other chemical units. For example, the polysaccharide may be acetylated or sulfated 
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at several sites on the chemical unit, or it may be charged or uncharged. Thus one 
property of a polysaccharide may be the identity of one or more basic building blocks of 
the polysaccharides. 

A basic buiiding block alone, however, may no. provide information abou, the 
cha.geandthenatureof^n.en^of.hesaccharideordisaccharid, Forexample a 
bu.Id.ng block of uronic acid may be iduronic or glucuronic acid. Each of these building 
blocks may have additional substituents tha, add complexity to the suture of the 
chemica, unit. A single property, however, may no, identify such additional substitutes 
charges, etc., in addition to identifying a complete building block of a polysaccharide 
> Th.s ^formation, however, may be assemMed from several properties. Thus, a property 
of a polymer as used herein does no, encompass an amino acid or nucleotide bu, does 
encompass a saccharide or disaccharide building block of a polysaccharide 

A type of property that provides information about a polymer may depend on a 
<ype of polymer being anafyzed. For instance, if the polymer is a polysaccharide 
properties such as charge, molecular weigh,, nature and degree of sulfation or 
acetylation, and type of saccharide may provide information abou, me polymer 
Properties may include, bu, are no, limited ,„, charge, chM*, nato e of substituents 
quantity of substititems, molecular weigh,, molecular lengm, compositional ratios of ' 
subsmuems or unite, type 0 f basic building block of a polysaccharide, hydrophobic!* 
enzymatic sensitivity, hydrophilicity, secondly stiuchne and conformation (i.e„ position 
ofhehc.es), spatial distribution of substimente, ratio of one se, of modifications ,o 
anofter se, of modifications (i.e., relative amoums of 2-0 suction ,o N-su,fation or ratio 
of .duronic acid ,o glucuronic acid), and binding si,es for proteins. Omer properties may 
be .dentified by .hose of ordmary ski,, in the art. A substituent, as used herein is an atom 
or group of atoms ma, substitute a unit, bu, are no, themselves the unite. 

A property of a polymer may be identified by any means known in me „ ^ 
procedure used te identify a property may depend on a ,ype of property. MCecuhr 
werght, for instance, may be delermined by several mchods including mass 
spectiometiy. The use of mass spectiometry for deiennining the molecular weigh, of 
powers is well known in me an. Mass S^ometiy has been used as a powerful too, 
* characterize po.ymers because of ite accumcy ( ±1D ahon) in reporting me masses of 
ftagments generated (e.g., by enzymatic cleavage,, and also because only pM sample 
concentrations are required. For example, matrix-assisted laser desorption ionization 
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mass spectrometry (MALDI-MS) has been described for identifying the molecular 
weight of polysaccharide fragments in publications such as Rhomberg, A. J. et al, PNAS, 
USA, v. 95, p. 4176-4181 (1998); Rhomberg, A. J. et al, PNAS, USA, v. 95, p. 12232- 
12237 (1998); and Ernst, S. et. al., PNAS, USA, v. 95, p. 4182-4187 (1998), each of 

5 which is hereby incorporated by reference. Other types of mass spectrometry known in 
the art, such as, electron spray-MS, fast atom bombardment mass spectrometry (FAB- 
MS) and collision-activated dissociation mass spectrometry (CAD) can also be used to 
identify the molecular weight of the polymer or polymer fragments. 

The mass spectrometry data may be a valuable tool to ascertain information about 

10 the polymer fragment sizes after the polymer has undergone degradation with enzymes 
or chemicals. After a molecular weight of a polymer is identified, it may be compared to 
molecular weights of other known polymers. Because masses obtained from the mass 
spectrometry data are accurate to one Dalton (ID), a size of one or more polymer 
fragments obtained by enzymatic digestion may be precisely determined, and a number 

15 of substituents (i.e., sulfates and acetate groups present) may be determined. One 

technique for comparing molecular weights is to generate a mass line and compare the 
molecular weight of the unknown polymer to the mass line to determine a subpopulation 
of polymers which have the same molecular weight. A "mass line" as used herein is an 
information database, preferably in the form of a graph or chart which stores information 

20 for each possible type of polymer having a unique sequence based on the molecular 
weight of the polymer. Thus, a mass line may describe a number of polymers having a 
particular molecular weight. A two-unit nucleic acid molecule (i.e., a nucleic acid 
having two chemical units) has 16 (4 units 2 ) possible polymers at a molecular weight 
corresponding to two nucleotides. A two-unit polysaccharide (i.e., disaccharide) has 32 

25 possible polymers at a molecular weight corresponding to two saccharides. Thus, a mass 
line may be generated by uniquely assigning a particular mass to a particular length of a 
given fragment .(all -possible di, tetra,Jiexa,.octa,up to a hexadecasaccharide),and 
tabulating the results (An Example is shown in Figure 8). 

Table 1 below shows an example of a computed set of values for a 

30 polysaccharide. From Table 1, a number of chemical units of a polymer may be 

determined from the minimum difference in mass between a fragment of length n+1 and 
a fragment of length n. For example, if the repeat is a disaccharide unit, a fragment of 
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length* 2„_cha rideunits . c 

d,acchande and n = 2 may correspond „ . ^ ^ > ° f ' 



Fragment Length n 
1 — 


Minimum difference in rnasIT 
between n+1 and n (Dalton) 


9 


101.13 


Z 


13.03 : 

n ni ■ 


3 ~ 

4 


1 


9.01 " 


6 ~ 


'9.01 


7 ~ 


4.99 ~ 




4.99 ; 

~037 ' 


8 

1 ■- 




0.97 



TABLE 1 



Because mass spectrometry data indicates t^™« * 
TWfe u 3 feplaCed 3 SUlfate S rou P (80.06D) 

poiZ r^r ^ s ° ch number b = <° 

In addition to molecular weieht othf»r«r«^- . 

poiymer. Additionally, a number of substituents or 
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chemical units can be determined using calculations based on the molecular weight of 
the polymer. 

In the method of capillary gel-electrophoresis, reaction samples may be analyzed 
by small-diameter, gel-filled capillaries. The small diameter of the capillaries (50 urn) 
5 allows for efficient dissipation of heat generated during electrophoresis. Thus, high field 
strengths can be used without excessive Joule heating (400 V/m), lowering the separation 
time to about 20 minutes per reaction run, therefor increasing resolution over 
conventional gel electrophoresis. Additionally, many capillaries may be analyzed in 
parallel, allowing amplification of generated polymer information. 

10 In addition to being useful for identifying a property, compositional analysis also 

may be used to determine a presence and composition of an impurity as well as a main 
property of the polymer. Such determinations may be accomplished if the impurity does 
not contain an identical composition as the polymer. To determine whether an impurity 
is present may involve accurately integrating an area under each peak that appears in the 

15 electrophoretogram and normalizing the peaks to the smallest of the major peaks. The 
sum of the normalized peaks should be equal to one or close to being equal to one. If it 
is not, then one or more impurities are present. Impurities even may be detected in 
unknown samples if at least one of the disaccharide units of the impurity differs from any 
disaccharide unit of the unknown. 

20 If 311 impurity is present, one or more aspects of a composition of the components 

may be determined using capillary electrophoresis. Because all known disaccharide 
units may be baseline-separated by the capillary electrophoresis method described above 
and because migration times typically are determined using electrophoresis (i.e., as 
opposed to electroosmotic flow) and are reproducible, reliable assignment to a polymer 

25 fragment of the various saccharide units may be achieved. Consequently, both a 
composition of the major peak and a composition of a minor contaminant may be 
- assigned to a polymer fragment -Thecomposition-fof both the majofantf minor 
components of a solution may be assigned as described below. 

One example of such assignment of compositions involves determining the 

30 composition of the major AT-III binding HLGAG decasaccharide ( + DDD4-7) and its 
minor contaminant (+ D5D4-7) present in solution in a 9:1 ratio. Complete digestion of 
this 9:1 mixture with a heparinases yields 4 peaks: three representative of the major 
decasaccharide (viz., D, 4, and -7) which are also present in the contaminant and one 
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pe*A tha, is present „ nly in te contamiMnt ,„ ^ wrds> ^ 
D, 4, and 7 represents m addi „. ve combinai . on ^ a 

.ecasacchar, e and U,e c„ ntribu(ion ftom lhe ^ £ 

represents only the contaminant. Priori 

5 und j7 S t COmPOSWOn ° f COn,aminan, "* -*» «1— I* area 

0T4 7 D S h ^ ~ *» " - ™ fe ^ yields a 1:1:3 rati o 

• te *«™^..o no f tt e ta puri V isnvoDs,o» e 4,one.7a„done5 

.hose oftr: f iden ? ng ote w " ° f propeniK my * - 

I*- chrc .matography (RP^c, ElEynatjc 

-*«p«* ^^-ybedeterntined.in,^,^ L. ^ 

~c7 7 "^^^""'^^—edinas^manner 
* enzymatic degradation, by exposmg a substiate ,„ dte enzyme a„ d using 

^fer a sulfate group t0 an HS chain having a concomitant increase in 80Da 
Ration may be detennined by modeling and nuclei magnetic resonance (NMR) 
^erelattveamoun. of sulfation may bedetermined by compositional analysis o^ 
approxtmately determined by raman spectroscopy. 

Insomeas *e invention is uscfu, f or genera , in& 
gating formation about polymers. In ^ ^ ^ 
po^er ,s asstgned a tmioue numeric identic, which may be «, , 0 classify l e 
complete brnWing Mode. For instance ifa^lysaccbaHde is being „ 2 

^echande and all of its substi.cn,, charges etc. A basic buitojg Uoc* L ,0 
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information is generated and processed in the same manner as described above with 
respect to "properties" of polymers. 

Currently, saccharide fragments are detected in capillary electrophoresis by 
monitoring at 232 nm, the wavelength at which the A 4,5 double bond, generated upon 
5 heparinase cleavage, absorbs. However, other detection methods are possible. First, 
nitrous acid cleavage of heparin fragments, followed by reduction with 3 H-sodium 
borohydride yields degraded fragments having a 3 H radioactive tag. This represents both 
a tag which may be followed by capillar}' electrophoresis (counting radioactivity) or 
mass spectrometry (by the increase in mass). Another method of using radioactivity 

10 would be to label the heparin fragment with S 35 . Similar to the types of detection 

possible for H-labeled fragments, S " labeled fragments may be useful for radioactive 
detection (CE) or measurement of mass differences (MS). 

Especially in the case of S 35 , this detection will be powerful. In this case, the 
human sulfotransferases may be used to label specifically a certain residue. This will 

15 give additional structural information. 

Nitrous acid degraded fragments, unlike heparinase-derived fragments, do not 
have a UV-absorbing chromophore. As we have shown, MALDI-MS will record the 
mass of heparin fragments regardless of how they are derived. For CE, two methods 
may be used to monitor fragments that lack a suitable chromophore. First is indirect 

20 detection of fragments. We may detect heparin fragments with our CE methodology 
using a suitable background absorber, e.g., 1,5-napthalenedisulfonic acid. The second 
method for detection involves chelation of metal ions by saccharides. The saccharide- 
metal complexes may be detected using UV- Vis just like monitoring the unsaturated 
double bond. 

25 Other groups have begun the process of raising antibodies to specific HLGAG 

sequences. We have previously shown that proteins, e.g., angiogenin, FGF, may be used 
as the complexing-agent- instead- of a synthetic-, basic peptide-. By extension^ antibodies 
could be used as a complexing agent for MALDI-MS analysis. This enables us to 
determine whether specific sequences are present in an unknown sample simply by 

30 observing whether a given antibody with a given sequence specificity complexes with 
the unknown using MALDI-MS. 

The final point is that using mass tags, we may distinguish the reducing end of a 
glycosaminoglycan from the non-reducing end. All of these tags involve selective 
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*-d-V with ,he anomenc OH (present a, th e reducing end of the polymer)) mus 
^„ cc ltereducingendof , hecMn ^^^^ 

acd Whl£ „ flu „ In ge „ era| Kgs invoIye chtmh0y 

^ oranges the anomeric posillon t0 (om ^ ^ * ™ 

anomenc OH to form semicarbazides. Commonly used tags (other than 7 ■ , ■ 
aei^meludemefoHowingcompoun,, 2-am 1 noben 2 „,e 

1 . semicarbazide 

2. Girard's P reagent 
3 3. Girard's T reagent 

4. p-aminobenzoic ethyl ester 

5. biotin-x-hydrazide 
6. 2-aminobenzamide 

7. 2-aminopyridine 

8. anthranilic acid 

0. 8-aminonaphthalene-l,3,6-trisulfonic acid 
1 1. 2-aminoacridone 

ID 204aT,! D ' " ° f 11,6 ChemiCal ' mit 10 ^ " -* 

H> 204. contams one or more fields 2 1 2a-e for storing Mo^on about properties of a 

*»fad tn more deta.1 wtth respect to po ly saccharides because of me comp.e* nam. 
fpolysacchandes. ^ invennon, however, is no, limiK d „ po,^^. ^ 
IT?" T ^-^—M™ (HLGAO, fragment and the high 

these complex mo.ecules. HeparM^cosaminog,^ (HUMGs) which 
m^he ^"-^an^fa.arecomp.e.po^ccharidemo.ecuiesmadepof 
repeat unns comprising _ine and glucuronic/iduronic acid Z „ 
"*«**»*«~ ^edefiuinguni.maybemodifiedby 

:La:c 5 •" 0a,d6 " 0P0Siti0n0f ' heh — ^^noftLonic 
acd, and C5 eprmenzatton tha, convens me glucuronic acid ,„ iduronic acid The 

disaccharide unit of HLGAG ma y be represented as: 
(« 1 -M) lKhm (a/p 1 -M) H 30)0W «« («, , _^ 

tylated ( COCH,) or, m rare cases, neither sulfated nor acetylattd. 
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The fields 212a-e may store any kinds of values, such as, for example single-bit 
values, single-digit hexadecimal values, or decimal values. In one embodiment, the 
chemical unit ID 204a includes each of the following fields: (1) a field 212a for storing a 
value indicating whether the polymer contains an iduronic or a glucuronic acid (I/G); (2) 

5 a field 212b for storing a value indicating whether the 2X position of the iduronic or 
glucuronic acid is sulfated or unsulfated; (3) a field 212c for storing a value indicating 
whether the hexoseamine is sulfated or unsulfated; (4) a field 21 2d indicating whether 
the 3X position of the hexoseamine is sulfated or unsulfated; and (5) a field 212e 
indicating whether the NX position of the hexoseamine is sulfated or acetylated. 

10 Optionally, each of the fields 212a-e may be represented as a single bit. 

Table 2 illustrates an example of a data structure having a plurality of entries, 
where each entry represents an HLGAG encoded in accordance with Fig. 2D. Bit values 
for each of the fields 212a-e may be assigned in any known manner. For example, with 
respect to field 212a (I/G), a value of one may indicate Iduronic and a value of zero may 

15 indicate Glucuronic, or vice versa. 



I/G 


2X 


6X 


3X 


NX 


ALPH 
CODE 


DISACC 


MASS 
(AU) 
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0 
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0 
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0 
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459.39 


0 


0 


1 
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497.41 


0 
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1 


0 


6 


I"Hnac3S,6S 
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459.39 
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I2S"HnS.3S 
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539.45 
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fields) into a single byte or sequence of bytes. Furthermore, bit fields may be stored and 
manipulated quickly and efficiently by digital computer processors, which typically store 
information using bits and which typically can quickly perform operations (e.g., shift, 
AND, OR) on bits. For example, as described in more detail below, a plurality of 
5 properties each stored as a bit field can be searched more quickly than searches 
conducted using typical character-based searching methods. 

Further, using bit fields to represent properties of HLGAGs permits a user to 
more easily incorporate additional properties (e.g., 4-0 sulfation vs. unsulfation) into a 
chemical unit ID 204a by adding extra bits to represent the additional properties. 

10 In one embodiment, the four fields 212b-e (each of which may store a single-bit 

value) may be represented as a single hexadecimal (base 16) number where each of the 
fields 212a-e represents one bit of the hexadecimal number. Using hexadecimal numbers 
to represent disaccharide units is convenient both for representation and processing 
because hexadecimal digits are a common form of representation used by conventional 

15 computers. 

Optionally, the five fields 212a-e of the record 210 may be represented as signed 
hexadecimal digit, in which the fields 212b-212e collectively encode a single-digit 
hexadecimal number as described above and the I/G field is used as a sign bit. In such a 
signed representation, the hexadecimal numbers 0-F may be used to code chemical units 
20 containing iduronic acid and the hexadecimal numbers -0 to -F may be used to code units 
containing glucuronic acid. The chemical unit ID 204a may, however, be encoded using 
other forms of representations, such as by using a twos-complement representation. 

The fields 212a-e of the chemical unit ID 204a may be arranged in any order. 
For example, a gray code system may be used to code HLGAGs. In a gray code 
25 numbering scheme, each successive value differs from the previous value only in a 
single bit position. For example, in the case of HLGAGs, the values representing 
- HLGAGs may be arranged so thatany two neighboring-values differ in the value of only 
one property. An example of a gray code system used to code HLGAGs is shown in 
Table 3. 
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TABLE 3 



Table 3 illustrates that use of a gray coding scheme arranges the disaccharide 
5 building blocks such that neighboring table entries differ from each other only in the 
value of a single property. One advantage of using gray codes to encode HLGAGs is that 
a biosynthesis of HLGAG fragments may follow a specific sequence of modifications 
starting from the basic building block G-HHNac- 

In Table 3, bit weights of 8, 4, 2, and 1 are used to calculate the numerical 
1 0 equivalent of a hexadecimal number with the most significant bit (I/G) being used as a 
sign bit. For example, the hexadecimal code A (01010 binary) is equal to 8*1 + 4*0 + 
2*1 + 1*0 = 10. 

In another embodiment, the weights of each of the fields 212a-e may be changed 
thereby implementing an alternative weighting system. For example, bit fields 212a-e 
15 may have weights of 1 6, 8, 4, -2, and -1 , respectively, as shown in Table 4. 
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HLGAGs, may receive a low score based on a scheme in which the bits are weighted in 
the manner shown in Table 4. 

Optionally, the sulfation and acetylation positions may be arranged in an shown 
in Table 2: I/G, 2X, 6X, 3X, NX. These positions may, however, be arranged 
5 differently, resulting in a same set of codes representing different disaccharide units. 
Table 5, for example, shows an arrangement in which the positions are arranged as I/G, 
2X,NX, 3X, 6X. 
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TABLE 5 

ft has been obsen-ed that disaccharide units in some HLGAO sequences are 
-^N-s^K-ace^d. Sucb disacenaHde ^.s „a y be ing 
the chemical umt ID 204a in any of a number of ways 

If the propereies of a chemical um, are represented by bi, fleids, disaccharide 

- may correspond . a free amine, and an NY fieid having a vaiue of one may 
correspond .oN-acetylation,„rvice vers, Further, a vaiue of one in me NX fid 2 12e 
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8 66 P ermissl ble values. For example, a value of 
zero may correspond to a free amine aval,,* n f n P.avaiueot 

amine, a value of one may correspond to N-acetylation, 
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and a value of two could correspond to N-sulfation. Similarly, the values of any of the 
fields 212a-e may be represented using a number system with a base higher than two. 
For example, if the value of the field 212e (NX) is represented by a single-digit number 
having a base of three, then the field 212e may store three permissible values. 
5 Referring to Fig. 1 , user may perform a query on the polymer database 1 02 to 

search for particular information. For example, a user may search the polymer database 
102 for specified polymers, specified chemical units, or polymers or chemical units 
having specified properties. A user may provide to a query user interface 108 user input 
106 indicating properties for which to search. The user input 106 may, for example, 
10 indicate one or more chemical units, a polymer of chemical units or one or more 

properties to search for using, for example, a standard character-based notation. The 
query user interface 108 may, for example, provide a graphical user interface (GUI) 
which allows the user to select from a list of properties using an input device such as a 
keyboard or a mouse. 

15 The query user interface 1 08 may generate a search query 1 10 based on the user 

input 106. A search engine 112 may receive the search query 1 10 and generate a mask 
1 14 based on the search query. Example formats of the mask 1 14, and example 
techniques to determine whether properties specified by the mask 1 14 match properties 
of polymers in the polymer database 102 are described in more detail below in 

20 connection to Fig. 3. 

The search engine 1 12 may determine whether properties specified by the mask 
1 14 match properties of polymers stored in the polymer database 102. Subsequently, the 
search engine 1 12 may generate search results 116 based on the search indicating 
whether the polymer database 102 includes polymers having the properties specified by 

25 the mask 1 14. The search results 1 16 also may indicate polymers in the polymer 

database 102 that have the properties specified by the mask 1 14. For example, if the user 
input L06 specified properties ofachemical unit, the-search results 1-16 may-indicate 
which polymers in the polymer database 102 include the specified chemical unit. 
Alternatively, if the user input 106 specified particular chemical unit properties, the 

30 search results 1 1 6 may indicate polymers in the polymer database 1 02 that include 

chemical units having the specified chemical unit properties. Similarly, if the user input 
106 specified particular polymer properties, the search results 116 may indicate which 
polymers in the polymer database 102 have the specified polymer properties. 
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Fig. 3 is a flowchart illustrating an example of a process 300 th», u 

1 12 may receive a search query 11 0 from the m ' * ^ 

a- u • qUery interface 1 08. Next in art in* 

the search engme 1 12 may generate a mask 1 14 generated h., A u ' 
^ In a following act 306, the search engine 1 , , ST T " ^ 1 1 ° 

«vn engine 1 1 2 may generate the searc"h result iia, j 
resu lts „f thebin ^ operationperformed . nstep3o6 ^ «16 based „» the 

The process 300 wi,l now be described in more detail with respect » an 

act 302, the receded search query 1 10 may indicate to search th„ ™, 7 
^particular chemical unit, e. g . „. chemica| „„„ ° *— 102 

-heme shown in Table , is used to encode chli 

chemical unit I « v enCode chen,lci " ™* m the polymer database the 

values of the bits of the mask 1 14 may soecifv th. ■ ' 

Ttio u • ^ maicate that the 2X position is sulfated 

The search engme 112 may use this mask iii, ni , ♦ • 

12 may perform a l„g,cal AND operation on each chemical unit of each of the „T 

operatron o„ a parncnlar chemical unit is equal ,„ me value of rae J^T ^ 

:^™' tmy ^ te ^^"0,and,mac,3O 8 ,J s l 1 l;:*; 
maymdtcateasuccessfulmatchinthesearchresultslie n. . Ben8 ' ne " 2 
Senerate additional information in me search ^ ^ T ^ " 2 ^ 
-^^ercon^men.tch^chemL!:; ^^^^ 

In response to receiving the search query in act 302 in--,™,, .u 
' 12 also may generate the mask 1 14 that 1 , ^ *" ^ 
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engine 1 12 may set each bit position in the mask according to a property specified by the 
search query to the value specified by the search query. Consider, for example, search 
query 110 that indicates a search for all chemical units in which both the 2X position and 
the 6X position are sulfated. To generate a mask corresponding to this search query, the 
5 search engine 1 12 may set the bit positions of the mask corresponding to the 2X and 6X 
positions to a value corresponding to being sulfated. Using the coding scheme shown 
above in Table 1 , for example, in which the 2X and 6X positions have bit positions of 3 
and 2 (counting from the rightmost position beginning at bit position zero), respectively, 
the mask corresponding to this search query is 01 100. The two bits of this mask that 
10 have a value of one correspond to the bit positions in Table 1 corresponding to the 2X 
and 6X positions. 

To determine whether the one or more properties of a particular chemical unit in 
the polymer database 102 match the one or more properties specified by the mask 1 14, 
the search engine 1 12 may perform a logical AND operation on the chemical unit 

15 identifier of the chemical unit in the polymer database 102 using the mask 114. To 
generate search results for this chemical unit (i.e., act 308), the search engine 1 12 may 
compare the result of the logical AND operation to the mask 114. If the values of the bit 
positions of the logical AND operation corresponding to the properties specified by the 
search query are equal to the values of the same bit positions of the mask 114, then the 

20 chemical unit has the properties specified by the search query 1 10, and the search engine 
112 indicates a successful match in the search results 1 16. 

For example, consider the search query 110 described above, which indicates a 
search for all chemical units in which both the 2X position and the 6X position are 
sulfated. Using the coding scheme of Table 1, the bit positions corresponding to the 2X 

25 and 6X positions are bit positions 3 and 2. Therefore, after performing a logical AND 
operation on the chemical unit identifier of a chemical unit using the mask 1 14, the 
search engine 1 12 .compares. bit_positions 3„an<L2-of the result olthe.logicaL AND 
operation to bit positions 3 and 2 of the mask. If the values in both bit positions are 
equal, then the chemical unit has the properties specified by the mask 1 14. 

30 The techniques described above for generating the mask 1 14 and searching with a 

mask 1 14 also may be used to perform searches with respect to sequences of chemical 
units or entire polymers. For example, if the search query 1 10 indicates a sequence of 
chemical units, the search engine 1 12 may fill the mask 1 14 with a sequence of bits 
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search results 116 as described above 

Using the techniques described above the ™ivm~ j . u , « 

denser 2 04a) may be spared t0 . search quay fc 
s,ngle bmary operas (e ,„ . bina ^ AND ope ration) . As described abov ^ 

conventional notation systems that use characte-bas-d notari™, 

seoue nc «„f.h • , • "as-dnomtion systems to encode 

sequences of chemical units (e.g., systems which encode DNA «™ 

of characters) typically search for a sub_e oft f " 

second sequence of character) and „se laZr Z * " 
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of the length of the query sequence, searches may be performed more quickly than 
conventional character-based methods whose speed is related to the length of the query 
sequence. Further, the binary operations used by the search engine 1 12 may be 
performed more quickly because conventional computer processors are designed to 
5 perform binary operations on binary data. 

A further advantage of the techniques described above for searching using binary 
operations is that encoding one or more properties of a polymer into the notational 
representation of the polymer enables the search engine 1 12 to quickly and directly 
search the polymer database 102 for particular properties of polymers. Because the 

10 properties of a polymer are encoded into the polymer's notational representation, the 
search engine 1 1 2 may determine whether the polymer has a specified property by 
determining whether the specified property is encoded in the polymer's notational 
representation. For example, as described above, the search engine 1 12 may determine 
whether the polymer has the specified property by performing a logical AND operation 

15 on the polymer's notational representation using the mask 114. This operation may be 
performed quickly by conventional computer processors and may be performed using 
only the polymer's notational representation and the mask, without reference to 
additional information about the properties of the polymer. 

Some aspects of the techniques described herein for representing properties using 

20 binary notation may be useful for generating, searching and manipulating information 
about polysaccharides. Accordingly, complete building block of a polymer may be 
assigned a unique numeric identifier, which may be used to classify the complete 
building block. For example, each numeric identifier may represent a complete building 
block of a polysaccharide, including the exact chemical structure as defined by the basic 

25 building block of a polysaccharide and all of its substituents, charges etc. A basic 

building block refers to a basic ring structure such as iduronic acid or glucuronic acid but 
does not include substituents, charges etc. Such building block-information may be 
generated and processed in a same or similar manner as described above with respect to 
"properties" of polymers. 

30 A computer system that may implement the system 100 of FIG. 1 as a computer 

program typically may include a main unit connected to both an output device which 
displays information to a user and an input device which receives input from a user. The 
main unit generally includes a processor connected to a memory system via an 
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the pressor and memory system via the interconnection mechanism 

One or more output devices may be connected ,o the computer system ExamDle 

output devtces include a cathode ray tube fCR-nn;^, ,• P 
. • . c ">' mDe ( t -KT) display, liquid crystal disulavs (T rm 

> pnnters, communication devices such as a modem, and audio output One ' 

devices also may be connected ,„ the computer system P , more input 

input devices such as semnrQ tj,» c u- 

as sensors. The subject matter disclosed herein is not limited to the 

Tie computer system may be a general purpose computer system which is 
programmable using a computer programming ianguage, such as L. 
language, such as a scripting language or assembly lan,ua e e The. ™ a ' 0r '"> K ' 

*eprocessortvp,cal,y is a commerciaHyavailableprocessor, ofwhich the serie Z 

and Cynx, the 680X0 senes mtcroprocessors avaitable from Motorola, the PowerPC 
mtcroprocessor from IBM and the Alpha-seHes processors from Di gM 
» Corporation, are examples. Many other processors are available Such a 

S, VMS and OSS are examples, which controls the execution of other computer 

conation, storage assignment, da. management and memory ma.gement, 1 
* — cationcontioland^ateds.viee, ^ processor Jope^em define 

languages may be written. e tuning 

A memory system typically mcmdes , compmer ^ 

nonvolatile recording medium, ofwhich a magnetic disk a fl.* T 
^„ i *™ «wgiicuc aisK, a Hash memory and tane arp 

3» examples. The disk may be removable, such as a "floppy disk." T 

aharddrive AdiAW. ,. , «°PPyoisk, or permanent, known as 

butary form, , e„ a form interpreted as a sequence ofone and zero, Social may 
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stored on the disk to be processed by the application program. Typically, in operation, 
the processor causes data to be read from the nonvolatile recording medium into an 
integrated circuit memory element, which is typically a volatile, random access memory 
such as a dynamic random access memory (DRAM) or static memory (SRAM). The 
integrated circuit memory element typically allows for faster access to the information by 
the processor than does the disk. The processor generally manipulates the data within 
the integrated circuit memory and then copies the data to the disk after processing is 
completed. A variety of mechanisms are known for managing data movement between 
the disk and the integrated circuit memory element, and the subject matter disclosed 
herein is not limited to such mechanisms. Further, the subject matter disclosed herein is 
not limited to a particular memory system. 

The subject matter disclosed herein is not limited to a particular computer 
platform, particular processor, or particular high-level programming language. 
Additionally, the computer system may be a multiprocessor computer system or may 
include multiple computers connected over a computer network. It should be understood 
that each module (e.g. 110, 120) in FIG. 1 may be separate modules of a computer 
program, or may be separate computer programs. Such modules may be operable on 
separate computers. Data (e.g., 1 04, 106, 1 1 0, 1 1 4 and 1 1 6) may be stored in a memory 
system or transmitted between computer systems. The subject matter disclosed herein is 
not limited to any particular implementation using software or hardware or firmware, or 
any combination thereof. The various elements of the system, either individually or in 
combination, may be implemented as a computer program product tangibly embodied in 
a machine-readable storage device for execution by a computer processor. Various steps 
of the process may be performed by a computer processor executing a program tangibly 
embodied on a computer-readable medium to perform functions by operating on input 
and generating output. Computer programming languages suitable for implementing 
suchra system include procedural-programming languages, object- oriented programming 
languages, and combinations of the two. 

Referring to FIG. 4, a system 400 for sequencing polymers is shown. The system 
400 includes a polymer database 402 which includes a plurality of records storing 
information corresponding to a plurality of polymers. Each of the records may store 
information about properties of the corresponding polymer, properties of the 
corresponding polymer's constituent chemical units, or both. The polymers for which 
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Motion is stored i„ the ^ datatee 402 ^ ^ 

one embody, each of «he records i„ the po lym e r database 402 include a oT 
■dentifier (ID) tha, identife ft. pol y mer corresponding ,„ the re d 2 7" 

fte P o, y mer database in other way, For example, records in the po , y mer dJZ^ 
may ,»c,„de o„, y a pol ym =r ID or may 0 „, v ^ ^ 
The polymer database 402 may be anv kinrf «f et 

■ ^"aboutpolymersasde^^^^ 

navh, _„„,, ,. ror ^™ple, the polymer database 402 

-ay be a flat file, a relational debase, a table in a database, an object or sm.cn.re in a 
computer-readable volatile or non-volatile memory, or any data ac essibJT 

computer-readable storage medium. 

In one embodiment, a polymer ID includes a plurality of fclds fa 

£ ymer ID. Stmtlarly, ,„ one embodiment chemica. umt IDs indude aplurali* 0[ 

hemrc, urn, ,D. Although the foIlowing dcscription refe ^ J ^ 

Ids, such descnption is equally applicable to the fields of polymer IDs 

beins stir* ° f " "* ^ ^ ^ ° f ^ <*»< » of 
b«ng stored u, a computer readable medium, such as a binary value a hexadecimal 

Sanation aboutany properties of the corresponding chemical unit 

A compositional analyzer 408 receives as input . sampfe ^ m ^ 
~ - ou-put polymer composition data 410 that is descriptive of the composition 
of I* sample polymer. A compositional analyzer as used herein is any type „" 
«, or experiment procedure mat may be used to identify a property of a 

-^ograph, ^po^ercompositionda^lOincl^esinfonnaLabo^ 
406 and the number of chemical units i, the sampler 406. A seouencer 4,2 
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generates a candidate list 416 of a subpopulation of polymers that might match the 
sample polymer 406 in the process of sequencing the sample polymer 406 using 
information contained in a mass line 41 4 and the polymer database 402. A candidate list 
is also referred to herein as a "population" of polymers. At the end of the sequencing 
5 process, the candidate list 4 1 6 contains zero or more polymers that correspond to the 
sample polymer 406. A subpopulation of polymers is defined as a set of polymers 
having at least two properties in common with a sample polymer. It is useful to identify 
subpopulations of polymers in order to have an information set with which to compare 
the sample polymer 406. 

10 Consider, for example, the sequence DD7DAD-7, which is a tetradecasaccharide 

(14 mer) of HLGAG containing 20 sulfate groups. The compositional analyzer 408 may, 
for example, perform compositional analysis of DD7DAD-7 by degrading the sequence 
to its disaccharide building blocks and analyzing the relative abundance of each unit 
using capillary electrophoresis to generate the polymer composition data 410. The 

1 5 polymer composition data 4 1 0 in this case would show a major peak corresponding to 
±D, a peak about V* the size of the major peak corresponding to ±7 and another peak 
about 1/4 the size of the major peak corresponding to ±A. Note that the ± sign is used 
because degradation by heparinase would create a double bond between the C4 and C5 
atoms in the uronic acid ring thereby leading to the loss of the iduronic vs. glucuronic 

20 acid information. From the polymer composition data 41 0, it may be inferred that there 
are 4 ±Ds, 2 ±7s and a ±A in the sequence. 

Referring to FIG. 5, a process 500 that may be performed by the sequencer 412 to 
sequence the sample polymer 406 is shown. The sequencer 412 receives the polymer 
composition data 410 from the compositional analyzer 408. The sequencer 412 uses the 

25 polymer composition data 4 1 0 and the information contained in the polymer database 
402 to generate an initial candidate list 416 of all possible polymers: (1) having the same 
length as the-sample-polymer 406 and (-2) having the same constituent-chemical units as 
the sample polymer 406 (step 504). 

For example, consider the sequence DD7DAD-7 mentioned above. The polymer 

30 composition data 4 1 0 indicates that the sequence includes 4 ±Ds, 2 ±7s and one ±A, and 
indicates that the length of the sample polymer 406 is seven. In this case, step 504 
(generation of the candidate list 416) involves generating all possible sequences having 
the same length as the sample polymer 406 and having 4 ±Ds, 2 ±7s and a ±A. In one 
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embodiment, the sequencer 4 1 2 uses a hn fx 
^ngthese^^^^^ 

±7s and a ±A • , 8 qUCnCeS ° f l6ngth seven ^ving 4 ± Ds 2 

±7s and a ±A using standard combinatoric methods 

each of U,e powers i„ dre candidate list 4I6 " ^ ~ * 

may, for example, be fte mass of ,h. „ , P^termmed property 

the calcula.ed values of the predefine, ""^ 412 

ti^ "pcny 01 tne sample polymer 406 f sten srm 

The sequencer 412 eliminates candidate polymers ft™,,,, 

hereb is a biochen^ca, process periled , eXP ~' C ~ " - used 

0 ^poiymerwhiehmayr/e c f^e °" 3 ^ in modi fl ca,io„ ,o 

512) ^ CSeqUenCer4I2 ~ PrOP ^ 

Polymers havmg property values that do not match the property values of th/ 
expenmental results 422 (step 514). 

If the size of the candidate list 4 1 a ;« i ^ ' 
(step 516) then * ^ 3 P redete nnined threshold (eg i) 

istep 516), then the sequencer 412 is done fsten si^ u ; 

polymers, dependmg upon the contents of the polymer 
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database 402 and the value of the predetermined threshold. If the size of the candidate 
list 416 is not less than the predetermined threshold (step 516), steps 510-516 are 
repeated until the size of the candidate list 416 falls below the predetermined threshold. 
When the sequencer 412 is done (step 518). the sequencer 412 may, for example, display 

5 the candidate list 41 6 to the user on an output device such as a computer monitor. 

Referring to FIG. 6, in another embodiment, the sequencer 412 uses a genetic 
algorithm process 600 to generate the initial candidate list 416 and to modify the 
candidate list 416 in order to arrive at a final candidate polymer that identifies the 
sequence of the sample polymer 406. The sequencer 412 generates a population of 

10 random sequences with the composition indicated by the polymer composition data 410 
and having the same length as the sample polymer 406 (step 602). The sequencer 412 
evaluates the fitness (score) of the polymers in the candidate list 416 using a scoring 
function based on the enzymatic degradation of enzyme ENZ (step 604). The genetic 
algorithm process 600 uses the fitness values to decide which of the sequences in the 

15 candidate list 416 can survive into the next generation and which of the sequences in the 
candidate list 416 has the highest chance of producing other sequences of equal or higher 
fitness by cross-over and mutation. The sequencer 412 then performs cross-over and 
mutation operations that select for fit sequences in the candidate list 416 into the next 
generation (step 606). If at least a predetermined number (e.g., three) of generations of 

20 the candidate list 4 1 6 include copies of the correct sequence with the maximum fitness 
(step 608), then the sequencer 412 is done sequencing. Otherwise, the sequencer 412 
repeats steps 604-606 until the condition of step 608 is satisfied. Cross-over and 
mutation operations are used by genetic algorithms to randomly sample the different 
regions of a search space. 

25 In one embodiment, steps 510 and 512 are automated (e.g., carried out by a 

computer). For example, after the initial candidate list 416 has been generated (step 
508), the sequencer 412 may divide the candidate list 416 into categories(the categories 
are preferably based on properties), such as hepl cleavable, hepIII cleavable, and nitrous 
acid cleavable (the property is enzymatic sensitivity). The sequencer 412 may then 

30 simulate the corresponding degradation or modification of the sequences present in each 
of the categories and search for those sequences that give fragments of unique masses. 
Based on the population of sequences that can give fragments of unique masses upon 
degradation or modification, the sequencer 412 chooses the particular enzyme or 
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IT r eXPerimental COnStraint t0 diminate Candidate P^rs from the 

« used, other expenmental constraints such as enzymes may be ^ including ^ 
exoenzymes and other HLGAG degrading chemicals. 

In another embodiment, the sequencer 412 u^,^ i , 
the ehnW «f chemical characteristic to guide 

the choice of expenmental constraint For examnlP n « r ^ 
united . " ni - torexam Ple, normalized frequencies of chemical 

units of known polymers containing 12s, G H N , and H m . k , , 
ev^i. *u ,. 2s, u, hn S , and HN ac may be calculated. For 

example s nonnahzed frequency ffe) „f chenlicaI ^ ^ 

d sacchande unrts). ^ £xampte set of 
sequences ,„ urn way is shown in table 6 below. 

Constraints used for 




TABLE 6 

The "constraints used for convergent" ™i„™ • a- 
u « convergence column indicates constraints that haw 

been shown t0 achieve for ^ 

O ce corona, anaiysis has been pe rform ed „„ . sample ^ ™ ~ 

".uencieson ^ °. H N s. and „ N „ ta ,e s mpl e sequence ^ c" to 
- relaave frequences of the know, sequences using the «. abov , To Klect a M J 

^Meabove • A ^ sequence wi,h rel aUve frequencies .ha, are sinUia, 1 r 
applied to the sample polymer. X 06 
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For example, Table 6 demonstrates that the presence of f(G) and f(HNAc) are 
important factors in the decision to use hepIII and nitrous acid, because nitrous acid clips 
after a H N s, and hepIII clips after a disaccharide unit containing G. The disaccharide unit 
I2S-Hns,6S is the dominant unit in heparin-like regions (i.e., highly-sulfated regions) of 
5 the HLGAG chains. Therefore, if a sequence is more heparin-like, then hepl may be 
chosen as the default enzyme and the information content present in chemical units 
containing G and H NAc become important for choosing enzymes and chemicals other 
than hepl. Similarly, for low-sulfated regions on HLGAG chains, hepIII may be a 
default enzyme and f(I 2 s) and f(Hws) become important for choosing hepl and nitrous 
10 acid. Similarly, one may also calculate the positional sulfate or acetate distribution along 
the chain and generate the criterion for using the sulfotransferases or sulfateases for 
convergence. 

The polymer database 402 may include information indicating that sulfation at a 

position of a polymer contributes 80.06D to the mass of the polymer and that substitution 
15 of a sulfate for an acetate contributes an additional 38.02D to the mass of the polymer. 

Therefore, the mass M of any polymer in the polymer database 402 may be calculated 

using the following formula: 

M = 379.33 + [0 80.06 80.06 80.06 38.02] * C, 

where C is the vector containing the binary representation of the polymer and * is a 
20 vector multiplication operator. For example, the mass of the disaccharide unit l2s-H N s,6S> 

having a binary representation of 01 101, would be equal to 379.33 + [0 80.06 80.06 

80.06 38.02] * [01 101] = 379.33 + 0 + 80.06*1 + 80.06*1 + 80.06*0 + 38.02*1 = 

577.47D. 

HLGAG fragments may be degraded using enzymes such as heparin lyase 
25 enzymes or nitrous acid and they may also be modified using different enzymes that 
transfer sulfate groups to the positions mentioned earlier or remove the sulfate groups 
from those positions. The modifying enzymes areexolytic "and ndn-^ocessive"which 
means that they just act once on the non reducing end and will let go of the heparin chain 
without sequentially modifying the rest of the chain. For each of the modifiable 
30 positions in the disaccharide unit there exits a modifying enzyme. An enzyme that adds 
a sulfate group is called a sulfotransferase and an enzyme that removes a sulfate group is 
called a sulfatase. The modifying enzymes include 2-0 sulfatase/ sulfotransferase, 3-0 
sulfatase/sulfotransferase, 6-0 sulfatase/sulfotransferase and N-deacetylase-N- 
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sulfotransferase. 7„ efilnction of ^ ^ . 

example a 2-0 su.fotra.sfen.se ^ . suJftte ^IT f 

and a 2-0 suifatase removes su|fate " » *■ «"-») 

5 HLGAO rfpora/i - ° P ' rom tte 2 "° P 0 *"™ of an iduronic acid 

HLOAGdegradrnge^ymesmcludeheparinasc-I heparins II l, • 
D-giucuronidaseandL-iduronidastTh,!,. "epannase- II , hepannase-lll, 

before a uronic acid. HepaJ ^ ^ , " " ^ 

iconic acid Heparin^ 3 ^ ^ 3 2 ^ ^ 

• Cleavage by the heparins the uronic acid before wh^ ^ ^ 

information of iduronic vs gIucuronic ^ h *" •» <«" «» 

Glucuronidase and iduronidasp *c *w 

luuroniaase, as their name suggests cleave at a* i 

ntemberedhexosamine ring to a 5 membered arJtydromarmito, 

may be carried out usine simnl. h- ^ ynK 0n a se 9 uence 

formed from th ^ 10 ** " «« wou!d be 

rormea irom the enzymatic activity. 

^ " " Sed 10 " 3 *— ^ Po^cctaide 
^ractoe relationships of these molecules ^ qUCDCe " 

analysis by CE and MALD, I " di8eSti<m ° f LMWH » d 

j canaMALDI-MS ; we may obtain an "dieest snectntm" ^ ■ 
preparations of LMWH thus derivi™ ■ * ■ ^spectrum ofvanous 
WH, thus denvmg urf ormatJOn ^ ^ ^ 
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thereof. Such information is of value in terms of quality control for LMWH 
preparations. 

The methods are also useful for understanding the role of HLGAGs in 
fundamental biological processes. Already MS has been used to look at the presence of 
various proteins as a function of time in Drosophila development. In a similar fashion 
HLGAG expression can be as a function both of position and of time in Drosophila 
development. Similarly the methods may be used as a diagnostic tool for human 
diseases. There is a group of human diseases called mucopolysaccharidosis (MPS). The 
molecular basis for these diseases is mostly in the degradation pathway for HLGAGs. 
For instance, mucopolysaccharidosis type I involves a defect in iduronidase, which clips 
unsulfated iduronate residues from HLGAG chains. Similarly, persons suffering from 
mucopolysaccharidosis type II (MPS II) lack iduronate-2-sulfatase. In each of these 
disorders, marked changes in the composition and sequence of cell surface HLGAGs 
occurs. Our methodology could be used as a diagnostic for these disorders to identify 
which MPS syndrome a patient is suffering from. 

Additionally the methods of the invention are useful for mapping protein binding 
HLGAG sequences. Analogous to fingerprinting DNA, the MALDI-MS sequencing 
approach may be used to specifically map HLGAG sequences that bind to selected 
proteins. This is achieved by sequencing the HLGAG chain in the presence of a target 
protein as well as in the absence of the particular protein. In this manner, sequences 
protected from digestion are indicative of sequences that bind with high affinity to the 
target protein. 

The methods of the invention may be used to analyze branched or unbranched 
polymers. Analysis of branched polymers is more difficult than analysis of unbranched 
polymers because branched carbohydrates, are "information dense" molecules. 
Branched polysaccharides include a few building blocks that can be combined in several 
different ways, thereby,coding for many sequences. For insmce, a tftsacch^de, in 
theory, can give rise to over 6 million different sequences. The methods for analyzing 
branched polysaccharides, in particular, are advanced by the creation of an efficient 
nomenclature that is amenable to computational manipulation. Thus, an efficient 
nomenclature for branched sugars that is amenable to computational manipulation has 
been developed according to the invention. Two types of numerical schemes that may 
encode the sequence information of these polysaccharides has been developed in order to 
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bridge the widely used graphic ^ ^ 

scheme discussed below. 

-^W^.^„e; Wfl « MI * me; The firs, notation scheme is 
based on a btnary numerical system. The binaty representation in conjunction with a 
tfee-traversmg algorithm is used to represent al, the possible combinations of the 
branched ^saccharides. Tie nodes (branch points) are easily amenable to 
computational searching through tree-traversing algorithms (Figure 7A). Figure 7A 
shows a notation scheme for branched sugars. Each monosaccharide unit can be 
represented as a node (N) in a tree. The bui.ding blocks can be defined as either (A), 
(B) or (C) where Nl, N2, N3, and N4 are individual monosaccharides. Each of these 
commons can be coded numerically , 0 represent bunding b , ocks 0 f information By 
defmmg glycosylate patterns in this way, there are several tree traversal and searcnin 
algorrthms m computer science that may be applied to soft, M s problem 

A stmpler version of mis norationa, scheme is shown i„ Figure 7B. This simplified 
r: Z eXKnded *° ^ ^ P ° SSib,e including lZ 

<tte t„_yl chttobiose moiety), and up to fo „ r bnurched chains from me core m 

<** as addition of fircose to the core, or fiacosyla^ of the GlcNac in me WhL 

***** on me branches, Thus, me superf*ni,y of N-linked po,ysaccharides can I 
^represented by ^ ^ ^ ^ _ ^ 

w* a GlcNac, b) number of branches: up ,„ f„„r braached c ^ ^ J 
O cNac, Ga, and Ne,, and c, modification, of the branch sugars. These modular J s 
J be systematically combined to generate a„ possible combinations of me 
^cchande. Representation of the branches and the sciences within the branches 
can be performed as a n-bi. bmary code (0 and ,) where „ is Ae number of 

monosaccharides in the branch Future 7C d m i,„ . ,.• , 

., cn ' ngure 7C «P«B > binary code containing the entire 

^ regarding the bra nc, W mere are up to four branches possible, each 
b^ can be reprinted by a 3-bi, bhrary code, giving a tota, of ,2 bina^ bits. The 
first b , represents the presence (binary 1) or absence (binary 0) of the GlcNac residue 
a^ntng *e _e. The second and the third bi, similarly represent the presence Z 
absence of the Ga, and the Neu residues in the branch. Henc a complete chaL 
contauung CcNac-Gal-Neu is represented as bina , whkh „ 
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decimal 7. Four of the branches can then be represented by a 4 bit decimal code, the 1 st 
bit of the decimal code for the first branch and the 2 nd , the second branch etc (right). 

This simple binary code does not contain the information regarding the linkage (a 
vs. p and the 1-6 or 1-3 etc.) to the core. This type of notation scheme, however, may be 
5 easily expanded to include additional bits for branch modification. For instance, the 
presence of a 2-6 branched neuraminic acid to the GlcNac in the branch can be encoded 
by a binary bit. 

b. Prime Decimal Notation Scheme: Similar to the binary notation described 
above, a second computationally friendly numerical system, which involves the use of a 
10 prime number scheme, has been developed. The algebra of prime numbers is extensively 
used in areas of encoding, cryptography and computational data manipulations. The 
scheme is based on the theorem that for small numbers, there exists a uniquely-definable 
set of prime divisors. In this way, composition information may be rapidly and 
accurately analyzed. 

15 This scheme is illustrated by the following example. The prime numbers 2, 3, 5, 

7, 11, 13, 17, 19, and 23 are assigned to nine common building blocks of 
polysaccharides. The composition of a polysaccharide chain may then be represented as 
the product of the prime decimals that represent each of the building blocks. For 
illustration, GlcNac is assigned the number 3 and mannose the number 2. The core is 

20 represented in this scheme as 2x2x2x3x3 =72 (3 mannose and 2 GlcNacs). This 
notation, therefore, relies on the mathematical principle that 72 can be ONLY expressed 
as the combination of three 2s and two 3s. The prime divisors are therefore unique and 
can encode the composition information. This becomes a problem when one gets to 
very large numbers but not an issue for the size of numbers we encounter in this analysis. 

25 From this number the mass of the polysaccharide chain can be determined. 

The power of the computational approaches of the notional scheme may be used 
to systematically develop an -exhaustive- list- of- all possible- combinations of the 
polysaccharide sequences. For instance, an unconstrained combinatorial list of possible 
sequences of size m n , where m is the number of building blocks and n is the number of 

30 positions in the chain may be used. In Figure 7C, there are 256 different saccharide 
combinations that are theoretically possible (4 combinations for each branch and 4 
branches = 4 4 ). 
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A mass line of ft, 256 different polysaccharide sutures may be pl„ tte d Then 
ft exampie (shown „ Figure 7B) , „ js ^ tha , ^ fct 

^-He^onofO^aca,^,,,^^^,. ^ ^ 
. *ou, be present for any of ^ other branchK K ex;st ^ on 

^ ^"V MiSt tf - ~ ~ " ~ SimiI ar constraints can 

e IT T n0tati ° n ' eVCl btf0re 8enera,i °" ° f * — ' ° f ~ 

e notation scheme ■„ p,ace, experiment data can be g e„e ra ,ed (such as MALDI-MS 

^ir d T ma,osraphy) ftose s ™ ,ha ' d ° not «* *» - * 

eununafcd. An nerative procure th e refore en a bl es a rapid convey ,„ a so.ution 

To .dentify branching patterns, a combination of MALDI-MS and CE (or other 
-hntoues may be used, as shown in the Examp.es. Ehmination of fte pend^ 
the branched polysaccharide may be achieve, by fte judicious „ of ex0 mi 
endows. A,, antennaty groups may be removed, retaining oniy fte GlcNAc 
moteties exuding from the mannose core and forming an "extended" core. In ft, way 
mformation about branching is retained, bu, separation and identification of giycoforl" 
-»* stmpier. One methodo,o g y fta, cou,d be empioyed to form extenders for 
m ■ poiysacchande structures „ ft e following . ^ rf ^ ^ 

. remove capping and branching groups from fte arms. Then app.ica.ion of endo-p. 
gaWtdase will cleave fte arms to fte extended core. For more unusual strucjs 
cuter exog,ycosidases are available, for instance xy,ases and glucosidases. By addition' 
of a coca,, of degradation enzymes, any polysaccharide motif may be reduld to i te 
correspondmg ^nded" mK . rf ..^ _ * 

made by mass spectral artalysis. There are uni,ue mass signature, associated with « 
^tended core motif depending on fte number of pendant arms (Figure 7D). Figure 7D 
s ows a marine o, fte ^ motife ^ ^ J ^ 

■year .struct by fte enzyme coc M . Show, are fte expected masses of monl di- 
and tetrantennary s« both wift and without a fucose linked a.-*o t0 fte core' 
GlcNAc mo.ety (from left to right). AI , „f ft e ^ded" core stiuctirres have a uniuue 

of fte vartous gjycan cores present may be competed by capiMary electrophoresis, which 
- proven to be a hi gU y rapid and sensitive means for uuantifytng po.ysac haHde 
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structures. [Kakehi, K. and S. Honda, Analysis of glycoproteins, glycopeptides and 
glycoprotein-derived polysaccharides by high-performance capillary electrophoresis. J 
Chromatogr A, 1996. 720(1-2): p. 377-93.] 

Examples 

Example 1: Identification of the number of fragments versus the fragment mass for 
Di, Tetra, and Hexasaccharide. 

The masses of all the possible disaccharide, tetrasaccharide and hexasaccharide 
fragments were calculated and are shown in the mass line shown in Figure 8. The X axis 
shows the different possible masses of the di ; tetra and hexasaccharides and the Y axis 
shows the number of fragments that having that particular mass. Although there is a 
considerable overlap between the tetra and hexasaccharide the minimum difference in 
their masses is 13.03D. Note that the Y axis has been broken to omit values between 17 
and 40, to show all the bars clearly. 

Example 2: Sequencing of an octasaccharide of HLGAG. 

Using hepl, hepll, hepIII, nitrous acid, and exoenzymes, such as 2-sulfatase and 
a-iduronidase, p-glucuronidase, n-deacetylase as experimental constraints and the 
computer algorithm described above, an octasaccharide (02), two decasaccharide (FGF 
binding and ATIII binding) and a hexasaccharide sequence of HLGAG were sequenced. 

7. Compositional Analysis of 02: 

Compositional analysis of 02 was completed by exhaustive digest of a 30 ]iM 
sample with heparinases I-III and analysis by capillary electrophoresis (CE). Briefly, to 
10 pL of polysaccharide was added 200 nM of heparinases I-III in sodium phosphate 
buffer pH 7.0. The reaction was allowed to proceed at 30°C overnight. For CE analysis 
the sample was. brought to 25 \iL. Naphthalene trisulfqnic acid (2 \M). was run as. an 
internal standard. Assignments of AU 2 s-H NS ,6s and AU-H NS ,6S were made on the basis 
that they comigrated with known standards. The internal standard migrated between 4 
and 6mins, the trisulfated disaccharide AU 2 s-H N s,6s migrated between 6 and 8 mins and 
the disulfated disaccharide AU-H NS ,6S migrated between 8 and 10 mins. Integration of 
the peaks indicated that the relative amounts of the two saccharides was 3:1. 
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20 



The CE data for 02 octasaccharide demonstrated that there is a major peak 

ZTTr ,0 occunin6 disaKharide - a 

small peak that corresponds to a disuifated disaccharide ( 4U -H NS , S ). The rdative 
abundance of these disacchar.de units obtained from the CE data shows tha, there are 3 
Ds ( ± ) and a 5 The number of possible combination of sequences having these 
dtsaccharideunitsisBl The pos sible combinations are shown in Table 7 below. 



Possible sequences 




Seq 


fragments 


formed 




(577) 


(577) 


(1074) 


±DDD5 


±D 


±D 


±D5 


±DDD-5 


±D 


±D 


±D-5 


±DD5D 


±D 


±D 


±D5 


±DD-5D 


±D 


±D 


±D5 



(ii) Heparinase III digest (jii) 

Se Q Fragments formed 
(1732) 

+DDD-5 ±DDD ±5 



fable~7 

2. Digestion of 02 with heparinase I: 

^"ofOZwascompletedusmgbo.hashortprocedureandanexhaus.ive 
d.gest. Short digestion was defined as using 100 nM of heparinase 1 and a digestion 

^ of 10 minutes. "Exhaustive" digestion was defined as overnight digestion with 200 
nM enzyme. All digests were completed a, room temr.rature. In A e case of 02 both 
dtgest conditions yie.d the same result, Short digestion with heparinase I vie,ds a 
pentasulfaKd tetrasaccharide (no acetyl groups) of m, 2 5300.1 (1074 6) and a 
d.saccharide of ^4802.6 (577. 1) corresponding to a treated disaccharide This 
profile did not change upon exhaustive digest of 02. 

««» , Up -°< aeammt wiUl heparinase 02 is clipped ,o fom *~ ™* ■* 

4802.6 and 5,300.,. From me masses of these fragments i, was possible ,o unioue>y 
deternune that rn/z of 4802.6 corresponded «, a trisuUated disaccharide and m/ z of 
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5300.1 corresponded to a pentasulfated tetrasaccharide. Since the disaccharide 
composition of the sequence was known the only trisulfated disaccharide that may be 
formed is ± D and the possible pentasulfated tetrasaccharides that may be formed are ± 
5D, ± 5-D, ± D5 and ± D-5. After identification of the fragments, the next step was to 
5 arrange them to give the right sequence. Since this was a cumbersome job to be handled 
manually a computer simulation was used to progressively eliminate sequences from the 
master list that did not fit the experimental data. Using the rule that heparinase-I cleaves 
before and I 2 s the heparinase-I digestion was simulated on the computer to generate the 
fragments for all the 32 sequences in the master list. From the list of fragments formed 

10 for each sequence, the computer was used to search for fragments that corresponded to 
the di and tetrasaccharide observed from the mass spectrometry data. The sequences that 
gave the fragments that fit the mass spec data of hep I are shown in Fig 8 A. It may be 
observed from Fig 8A that all the sequences have 3 Ds which is consistent with the 
known rules for hepl digestion used to produce these fragments. It may also be observed 

15 that two arrangements give the same product profile namely having the +/- 5 (I- H N ac,6S 
or G-H N s,6s) the reducing end and having +/- 5 at the second position from the non- 
reducing end. To resolve this issue a second experimental constraint, digestion with 
hepIII, was used. 

Table 7 provides a list of sequences that satisfy the product profiles of hepl and 
20 hepIII digests of the octasaccharide 02. (a) shows the sequences that gave the di and 
tetrasaccharide fragments as observed from the mass spectrometry data. The fragments 
listed below along with their masses are those generated by computer simulation of hepl 
digest, (b) sequences in (a) that give the hexasaccharide fragment observed in the mass 
spectrometry data after hepIII digestion. The fragments along with their masses were 
25 generated by computer simulation of hepIII digestion. 

3. Digestion of 02 mthhepari 

Digestion of 02 with heparinase III yielded a nonasulfated hexasaccharide of m/z 
5958.7 (1731.9) and an unobserved disulfated disaccharide (to conserve sulfates). Both 
30 short and exhaustive digests yielded the same profile. 

Heparinase III treatment of 02 resulted in a major fragment of m/z 5958.7 which 
was uniquely identified as a hexasaccharide with 9 sulfate groups. The only sequence 
that satisfied the product profile of hepIII digestion was ± DDD-5 which is shown in 
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Table 7. Table 7 shows tha, there s„„u,d be a -5 (G-^) in reducing end 

fragments were accurate to two decimal places. 
5 Thus it was possible to demonstrate the ability to converse to ,h,f , 

s":Hr"° encin8of ^^^ 

performer •T' ^ *** - 

P-ta-d to determme the mass and size of the saccharide as a complex with FOF-2 <0 

Ve^tararnaneta,.,^. , 6 , 1892 , (I9 o 9) ,. Dimers of FGF-2 bound to the 
saccharide (S) viewing a species with a .* of 37,009. By subtraction of FGF-2 
nto,ecular weight, the molecular mass „f mMe ^ 

corresponding to a decasaccharide with 14 sulfates and an „„!.., 

reducing end. anhydromanmtol at the 

A Compositional Analysis: 

ComposMona! anaiysis artd CE of FGF-2 binding saccharide were competed as 
descrtbeo above. Compositional analysis of this sample resmted in two peaks 
corresponding to ±0 „, ^ ^ „ fc ^ 

-reducmg end was not obsetved by CE (232 nm). Therefore, me no,reducing 1 
^ was tdennfied as +D ferW s> bv sequencing ^ ^ 

- ^-ceswimmis composition ^ .6Tab,eS ( i, Ofme ..seouencesl^ 
decasacchande are shown in Table 8(ii). 



53 



320 

for 

52 
28 . 

5 1 1 



^ Sequence 


Fragments formed and their mass 
577 1037 1731 1093 1670 




3U 1U4 dtDCD ±471 1DJ7* 




±U ±LM ±DDD ±471 1047 


1DDD4-7 


2LA*W 14-71 104-7* 


iDOCM-7 


*D 1D4 iDDD ±4-71 iD4-7 


±7D04-a 


±7/tD 1D4 ±7DD ±4-C< 4D4-D 


±7D04-a 


tlhO iD4 ±7DO ±40 i£M-0 



TABLE 8 

10 2. Digestion with heparinase I and heparinase III: 

To resolve the isomeric state of the internal uronic acid +D vs. -D, exhaustive 
digestion of the saccharide with heparinase I and heparinase III was performed. 
Heparinase I exhaustive digestion of the saccharide results in only two species 
corresponding to a trisulfated disaccharide(±D) and its anhydromannitol derivative, 

15 while heparinase III did not cleave the dccasaccharidc at all. 

Heparinase I digestion of the decasaccharidc yielded a pcntasulfated 
tetrasaccharide {px/z 5286.3) with an anhydromannitol at the reducing end and a 
trisulfated disacchaiide of m/z 4804.6. Tabic 8 shows the convergence of the 1 ; UI : 
binding decasaccharide sequence. Thus, it provides a list of sequences that satisfied the 

20 mass spectrometry product profiles of FGF-2 binding saccharide on treatment with hepl. 
Section (i) of Table 8 shows the master list of 16 sequences derived from compositional 
analysis and exoenzyme sequencing of the non-reducing end. The disaccharide unit at 
the non-reducing end was assigned to be a +D using exoenzymes and the 
anhydromannitol group at the reducing end is shown as \ The mass of the fragments 

25 resulting from digestion of decasaccharide with heparinase I are shown in (ii). Also 
shown in (ii) are those sequences from (i) that satisfy heparinase 1 digestion data. 
Section (iii) of Table 8 shows the sequence of decasaccharide from (ii) that satisfies the 
data from exhaustive digestion using heparinase I. This product profile may be obtained 
only if there is a hepl cleavable site at every position in the decasaccharide which led us 

30 to converge to the final sequence DDDDD' shown in section iii of Table 14. The above 
taken together confirm the sequence of the FGF-2 binding decasaccharide sequence to be 
DDDDD' [(l2sH N s,6s)4l2sMan 6S ]. 
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Example 4: Sequencing of an AT-IH binding saccharide. 

An AT-III binding saccharide was used as an example of the determine r 
complex sequence. determination of a 

/. Compositional Analysis: 

' C ™P°"«i°-U„a, y sisa„dCEwerecor„p,e,ed as described a*ove 

Cpmpos,t,onal a„aj ys is_„f ^ AT -HI bindillg sacchari ' 

Mldi„ 8 b,oc k s, correspond^ ,o 4 U„H„ K (+D) iUH +i " ^ 
in .he rdative ra,i„ of 3:,, respectjvelv " ' <±?) 

fo™.H , -.w.- The shortest polysaeeharide that may be 

formed Witt th,s composition corresponds to a deeasaecharid, • 

o —da.. r»~~«^jzz:£zz: t 



R»sibte«quences 

a +C0MD n.-rDa>ca 

5. +CMOD la-KHMXD 
7. 40OO00 15.-KM>D€a 



J Sequence fragments formed — 
Lwrr, ^(577) (577) (1059) 
L^DOCQP ID ±D ±D ±D-D 




TABLE 9 

2 Digestion with heparinase I; 



The J f eCaSaCCh3ride ' - four foments 

6419.7 a heptasulfated, s ,„ 8 , y acelylated 

hexas U |f a tedtet ra saceharidewith TO ' Z of538H a ^ • ,, 

480531 Al«, ■ f53S1,anda,nsulf «eddisaccharide(m/r 

48013). Also present , s . c0 „ laminam (t) a lctraMccharide ' 

of AT-,„ binding decasaccharide haj ^ ^ D4-7DD ornh h T""" 
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should show the appearance of a tagged D or DD residue at the reducing end. However, 
we have found all the different experiments used in the elucidation of the decasaccharide 
sequence to be consistent with each other in the appearance of a 4-7 tagged product and 
not a D (or a DD) product. Surprisingly, this saccharide did not contain an intact AT-III 
5 binding site, as proposed. Therefore, confirmation of the proposed sequence was sought 
through the use of integral glycan sequencing (IGS) methodology. The result of IGS 
agreed with our analysis. A minor contaminant saccharide has also been found. Of the 
320 possible sequences, only 52 sequences satisfied heparinase I digestion data Table 
15(i). The mass spectrum of the exhaustive digestion of the decasaccharide with 
1 0 heparinase I showed m/z values that corresponded to a trisulfated disaccharide and a 
octasulfated hexasaccharide, thereby further reducing the list of 52 sequences to 28 
sequences Table 9(ii). 

3. Digestion with heparinase II: 

To further converge on the sequence, a 'mass-tag' was used at the reducing end 
15 of the saccharide (A m/z of 56.1 shown as 't'). This enabled the identification of the 
saccharide sequence close to and at the reducing end. Typical yields for the mass-tag 
labeling varied between 80-90% as determined by CE. Treatment of the semicarbazide 
tagged decasaccharide, with heparinase II resulted in the following products: m/z 5958.4 
(nine sulfated hexasaccharide), m/z 5897.7 (tagged heptasulfated, singly acetylated 
20 hexasaccharide), m/z 5380. 1 (hexasulfated tetrasaccharide), m/z 5320.9 (tagged 

tetrasaulfated tetrasaccharide), m/z 5264.6 (tetrasulfated tetrasaccharide) and m/z 4805.0 
(a trisulfated disaccharide). The m/z value of 5320.9 and 5897.7 corresponded to a 
tagged tetrasulfated tetrasaccharide and a tagged heptasulfated hexasaccharide, both 
containing the N-acetyl glucosamine residue. This result indicated that +/- 4 (I/GH NA c,6s) 
25 is present at the reducing or one unit from the reducing end, thereby limiting the number 
of possible sequences from 28 to 6 Table 9 (iii). 

4. Digestion with nitrons acid: - — 

Partial nitrous acid digestion of the tagged as well as the untagged decasaccharide 

provided no additional constraints but confirmed the heparinase II data. Exhaustive 
30 nitrous acid digestion, however, gave only the reducing end tetrasaccharide (with and 
without the tag) as an undipped product. Exhaustive nitrous acid treatment of 
decasaccharide essentially gives one tetrasulfated single-acetylated anhydromannitol 
tetrasaccharide species (one tagged m/z 5241.5 and one untagged m/z 5186.5). This 
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o f mi , uely resolved Ae isomeric ; h ; redu ~ - 

suliatase, hexosamidase and glucuronidase) resulted in th. , 

trisaccharide Table 9 ,h * mpletC digeStion of *e 

nae - 1 ab,e 9 shows the convergence of the AT m u- j- , 

sequence frn m ™ u, " bmdln S decasaccharide 

sequence from 320 possible sequences to 52 to 28 to 6 to th- r . 

10 sequence of the AT III h™r a ^ Ae 

q oftheAT-IIIbmdmgdecasaccharide was deduced as ±0004-7 

^U 2 sH ns , 6S I 2s H NS)6s I 2s h NSi6s ih NAc6S gh ns , 3Si6s) . 

Example 5: Sequencing of a Hexasaccharidel of HLGAG 

10 P M HI was treated with 2mM nitrous acid in 20 mM HCI at 
» ^^nunutessuchthatlirniteddegradationoccurred 2 1 ^ 

detected as non-covalent complexes with (arg-elvl,^ w ,. 
observed as was a .e.asaccharide and dil^ ^ ^ ^ ^ 

» Peptide. Hereafter ^ va,ues a. J^t r UnC ° mP ' eXCd 
«-»* ,o ,he saccharide + pepTde t „" ^ ** *** <* 

After 20 rrunutes, nitrous acid treatment of HI yielded starts ♦ • , 

5«, (I655 , )whichcomspondedtoa he _ ccw ;^ ^™ atm/z 

This sample was then subjected to exoenzyme analysis TW 
30 added — iduronate 2 n analysis. Three exoenzymes were 

duronate 2-0 sulfate, .duronidase, and glucosamine 6-0 sulfatase Th. 
-trous acid sample was neutralized via addition of 1/5 volume of 200 U , 
acetate 1 mg/mL BSA P H 6.0 after which the e ^ 
sulfatase was added aiLgestioI^ 

S nonwim me first two enzymes was complete. Final 
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enzyme concentrations were in the range of 20-40 milliunits/mL and digestion was 
carried out at 37°C for a minimum of two hours. 

Upon incubation with iduronate 2-0 suifatase and iduronidase, the 
hexasaccharide and tetrasaccharide peaks were reduced in mass. The disaccharide was n< 
5 longer detectable after incubation with the enzymes. The hexasaccharide gave a new 
species at m/z 5627.3 (1398.8) corresponding to loss of sulfate and iduronate. The 
tetrasaccharide yielded a species of m/z 5049.3 (820.8) again corresponding to loss of 
sulfate at the 2-0 position and loss of iduronate. These data showed that all the 
disaccharide building blocks contained an I2S. 

10 Addition of glucosamine 6-0 suifatase and incubation overnight at 37°C resulted 

in the production of two new species. One at m/z 5546.8 (13 18.3) resulting from loss of 
sulfate at the 6 position on glucosamine and the other at m/z 5224.7 (996.2), again 
corresponding to a tetrasaccharide 6-0 sulfate. These data showed that except for the 
reducing end anhydromanitol containing disaccharide unit the other units contained 

15 HNS. The data indicated that the sequence is DDD\ indicating that this sequence was 
originally derived from nitrous acid degradation unlike the other sequences which were 
derived from degradation by the heparinases. 

Example 6: Sequencing of other complex polysaccharides 

20 The sequencing approach may be readily extended to other complex 

polysaccharides by developing appropriate experimental constraints. For example, the 
dermatan/chondroitin mucopolysaccharides (DCMP) consisting of a disaccharide repeat 
unit is amenable to a hexadecimal coding system and MALDI-MS. Similar to what is 
observed for HLGAGs, there is unique signature associated with length and composition 

25 to a given mass in DCMP. For instance, the minimum difference between any 

disaccharide and any tetrasaccharide is 139.2 Da, therefore, the length, the number of 
sulfates and acetates may be readily assigned for a given DCM polysaccharide up to an 
octa-decasaccharide. Similarly, in the case of polysialic acids (PSA), present mostly as 
homopolymers of 5-N-acetylneuraminic acid (NAN) or 5-N-glycolylneuraminic acid 

30 (NGN), the hexadecimal coding system may be easily extended to NAN/NGN to encode 
the variations in the functional groups and enabling a sequencing approach for PSA. 
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DCMP a. found i„ dense conceive Ussues such a, bone and cartilage ^ 
basic repeat urn, of .he dermataa/chondroitin mucopolysacchtuides (DCMP, Lay* 

■ a N-ac=*,a,ed galacosamine. The uronic acid may he g luC u,„ mc acid 2 7 d 

to DCMP. Like the hepannases 4, degrade HLGAGs, toe are distinct 
chondoromnases and other chemica] methods available ,ha, clir, a, , - r 
linkams* of nrur, j p at speofc glycosidic 

Lnkages of DCMP and serve as experimental constraints. Furdiermore since DCMP, 
acidic polysaccharides, the MALDI-MS twhni , 

reivi/u.Dl-Mbtechiu ques and methods used for Hrriir- 
be readily extended to the DCMPs. HLGAGs may 

PEN scheme art nass-Uentr, relays for DCMP: Shown m Ta „ k ,„ 

^prop^ncodednomenctoefPENJofthe >6 possible btmding Mocks 
dermatan/chondroitin family of molecules Th-, • n S w °<*s°f 
establish iron™, , •/ KUleS ' 11,6 ^"C-g approach enables one to 
«*,bhsh import ma ss,de„ tity rei*^ as well as master to of all possib,e 
DCMP fences from disaccharides to dodecasaccharides. These are oloL 
^asshov^, AsohservedforHLCAGs M ~ P ^;;™ 
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1 


0 


0 


0 


-0 


G-Gal NAc 


379.33 


1 


0 


0 


1 


-1 


G-Gal NAc ,< s 


459.39 


1 


0 


1 


0 


-2 




459.39 


1 


0 


1 


1 


-3 




539.45 




1 


0 


0 


-4 


G 2s -Gal NAc 


459.39 




1 


0 


1 


-5 


G 2s *Gal NAc#4s 


539.45 




1 


1 


0 


-6 


G2s~Gal NAC/6s 


539.45 




1 


1 


1 


-7 




619.51 



TABLE 10 

Table 10 shows the Property Encoding Numerical scheme used to code DCMPs. 
The first column codes for the isomeric state of the uronic acid (0 corresponding to 
5 iduronic and 1 corresponding to glucuronic). The second column codes for the 

substitution at the 2-0 position of the uronic acid (0-unsulfated,l -sulfated) . Columns 3 
and 4 code for the substitution at the 4 and 6 position of the galactosamine. Column 5 
shows the numeric code for the disaccharide unit, column 6 shows the disaccharide unit 
and column 7 shows the theoretical mass calculated for the disaccharide unit. 

1 0 Tools as experimental constraints: Similar to the heparinases that degrade 

HLGAGs there are chondroitinases that degrade chondroitin-like and dermatan-like 
regions of DCMP. The chondroitinases B, C, AC and ABC have distinct specificities 
with some overlap. For the most part the chondroitinases cover the entire range of 
linkages found in DCMP. There are several chondroitinases that have been isolated and 

15 cloned from different sources. In addition to the enzymes, there are a few well- 
established chemical methods that may be used to investigate DCMP. These include 
nitrous acid treatment. Thus there are adequate tools (enzymatic and chemical) which 
function as 'experimental constraints' to enable DCMP sequencing. Below we use two 
DCMP sequences to illustrate sequencing DCMP. 

20 A. Serpin HCF-2 binding DCMP hexasaccharide): 

The minimum size DCMP binding to serpin HCF-2 was isolated and its ~ 
composition was determined using elaborate methods which included anion exchange 
chromatography, paper electrophoresis and paper chromatography. The sequencing 
strategy through the integration of PEN and MS established the identity of this serpin 
25 HCF-2 binding saccharide to be a hexasaccharide with 6 sulfates and 3 acetates. The 
high degree of sulfation pointed to a dermatan-like saccharide. Since this saccharide was 
derived using partial N-deacetylation and nitrous acid treatment, it comprises a 5 
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member anhydrotalitol ring a, the reducing end. Composition analysis of the 
saccharide may be obtained by degradation using the chondroitinases. The condition 
" ° f AU * Ga W (±5) and AU 2s aTal« (aTal . anhydrotalitol - +5 -) in 
a 2: , ratio. This enabled the generation of a master lis. with 8 possibie sequences as 
shown in Table 1 !, 2-sulfatase and iduronidase treatment of the hexasaccharide 
produced a shift in the mass spectrum corresponding ,„ the toss of a sulfate and 
.duronate, thereby fixing the hs a, non-reducin, end (Table , lb). In order to converge 
ftrther, Chondroitinase B (which acts on iduronate residues in dermatan-like regions) 
was used and a single peak in the mass spectrum corresponding to a 2-sulfated 

disaccharide was observed TMq ua „<, +^ ~ 

served. This led us to converge to the sequence +555' C 2S -Gal NAc4S - 
i2s-ura] NAc 4S -i2s-aTal4 S ). 

TABLE 11 



+555' 

+55-5' 

+5-55' 

+5-5-5' 

-555' 

-55-5' 

-5-55' 

-5-5-5' 





+ 555' 
+55-5' 






2-sulfatase 


Chondroitinase B 


Sequence 


Fragments 
formed 
+5 ±5 ±5' 


Iduronidase 


+5-55' 
+5-5-5' 


► 


+555' 



(a) 



(b) 



B. Hypothetical: 

In mis example a "hypothetical DCMP polysaccharide" which is more complex 
man the previous example is used. Assume that MS yields a resuh that is interpreted to 
be an octasaccharide with , sulfates and 4 acetates, artd tha, the composition analysis 
potnts to three species corresponding to AU 2s Gal NAMS ( i5) , AUGsInac « (±2) and 
AU*GaW )s , s ( ±7) m 2: , : , relative abundmce ^ enabiK (o generaK ^ ^ 

hst, whtch would point to 96 possible sequences (Table 12a). I, is expected mat the 
dtgestion of the saccharide sample with chondroitinase AC wou.d result in two products 
«dh masses tha, would correspond to two tetrasu.fa.ed tetrasaccharide units and thereby 
-duce me master lis, ,„ 4 possible sequences (Table 12b). Complete deamination using 
hydra^no ysis and nitrous acid treannen, would resuh in 3 pealts, two corresponding * 
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10 



Treatment of the degraded products with 2-sulfatase and iduronidase (and not 
glucuronidase) should result in peaks that correspond to the loss of sulfate and iduronate 
residues. This would enable the identification of the isomeric state of 5 and 7 thereby 
converging the master-list to one sequence ±55-27 (AU2s-Gal NACi 4s-l2s-Gal NACi4S -G- 

GalNAc,6S-l2S-GalNAc,4S.6s)- 

TABLE 12 



Master list of 
96 sequences 



(a) 



droit 



Chondroitinase AC 
A. 



15 


Sequence 


Fragments 




±55-27 


±55 ±27 




±55-2-7 


±55 ±2-7 




±5-5-27 


±5-5 ±27 




±5-5-2-7 


±5-5 ±2-7 


20 







(b) 



Complete deamination 
nitrous acid treatment 



2-sulfatase, iduronidase 



Sequence 


Fragments 


±55-27 


±5' +5' -2' +7' 



25 



30 



35 



It is important to reiterate that, similar to what was developed for HLGAG, 
distinct or additional 'convergence strategies or experimental constraints' may be used to 
arrive at the 'unique' solution for DCMP. 

2. Polysialic Acid 

Polysialic acids are linear complex polysaccharides found as a highly regulated 
post-translational modification of the neural cell adhesion molecule in mammals that are 
present mostly as homopolymers of 5-N-acetylneuraminic acid (NAN) or 5-N- 
glycolylneuraminic acid (NGN). The monomeric units of NAN and NGN are linked by a 
2-8 glycosidic linkages, and may be modified at the 4-0, 7-0, and9-0 positions. The 
major modification is acetylation. In addition, much rarer modifications including 
sulfation and lactonization occur at the 9-0 position. A deaminated form of neuraminic 
acid namely 5-deamino-3.5-dideoxyneuraminic acid (KDN) has also been discovered. 
The PEN-MS sequencing approach is extended to polysialic acids, and using NAN and 
NGN units we illustrate how this is achieved. 
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PEN scheme and mass-idemi ty relationships f or PSA . P c A • 

Ihe d,menc repeats for HLGAG and DCMP The PFM c h / ™ 
Table 13 Th» • scheme for PSA is shown in 

NAN and NGN (Figure 9A and 9B, The minimum <tf *"" B 

pen* and hexasaccharide, .hereby providtag a safe g ° * 

fragments using MS. 



margin for detection of these 



NAN/ 

NGN 

0 



9X 



1 

T 



7X 



0 

T" 
T~ 



TABLE 13 



4X 



Code 
"0 



I 



-0 



Saccharide unit 
NAN~ 



NAN< Ac 



NAN, 



4Ac,7Ac 



NAN, 



9Ac 



NAN, 



4Ac,9Ac 



NAN 



7Ac.9Ac 



NAN, 
*NGN 
"NGN 



4Ac.7Ac.9Ac 



4Ac 



Mass 



309.28 



351.32 



351.32 



393.36 



35U2 



3933T 



39336" 



435.40 



325.27 



367.32 
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Shown in Table 1 3 is the Property Encoded Numerical scheme for PSA. Column 
1 codes for whether the monomeric unit is NAN or NGN. Columns 2,3 and 4 code for 
the variations in the 9, 7 and 4 positions respectively, where 1 corresponds to acetylated 
and 0 corresponds to unacetylated. Column 5 shows the numeric code for the PSAs. -0 to 
5 -7 was used instead of 8-F. Assigning the numbers to code for the variability in 
acetylation and the sign would indicate if it is NAN/NGN. Column 6 lists the 
monosaccharide represented by the code in column 5. Column 7 lists the theoretical 
mass calculated for the monomeric units shown in column 6. 

The mass-line for the combinations of substituted/unsubstituted NAN containing 

10 monomeric units in PSA is shown in Figure 9A. The X-axis represents the calculated 
masses for monosaccharide to hexasaccharides. Shown in the Y axis is the number of 
fragments of a particular length and composition that exists for a given mass. The values 
150-190 were omitted to improve the clarity of the other peaks. The minimum 
difference between any monosaccharide and any disaccharide is 165.2Da, between any 

15 di and any trisaccharide is 39.03Da, between any tri and any tetrasaccharide is 39.03Da 
and 3. 01 Da for all higher order saccharides. 

The mass-line for the combinations of substituted/unsubstituted NGN monomeric 
units in PSA is shown in Figure 9B. The X-axis represents the calculated masses for 
monosaccharide to hexasaccharide. Shown in the Y axis is the number of fragments of 

20 a particular length and composition that exist for a given mass. The values 150-190 were 
omitted to improve the clarity of the other peaks. The minimum difference between any 
monosaccharide and any disaccharide is 181.2Da, between any di and any trisaccharide 
is 55.03Da and 13Da for higher order saccharides. 

Tools as experimental constraints: There are several tools and detection methods 

25 available for studying PSAs. Based on the properties of the building blocks of PSA, this 
class of linear polysaccharides is amenable for MS. Methods of purifying PSA polymers 
- and obtaining composition using HPLC,CE and mass"spectrometry have" very recently 
been established. Enzymatic tools from various sources have been used to study PSA 
extensively. Notably the bacterial exosalidase which cleave PSA polymers processively 

30 from the non-reducing end and the bacteriophage derived endoneuramidase, which clips 
endolytically both the NAN and NGN containing PSA linear polysaccharides. In 
addition to these enzymes chemical methods such as hydrozonolysis followed by nitrous 
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acid «—-« and penodate oxidation fo„owed by soiium 
be used ,o as t oo,s to degrade PSA polysaccharides into smal.er polysaccharides 
Exampfc 7: Varia , ion of experta£ii(aI condi(ions ^ 
enzymahc reactions and its effect „„ , he me , nods 

Secondary specificities of the hepannases have been observed, especially under 

nzymology of bepan.ses, the reiative rates of c ,ea Vage of , ^ 0 sites 
hepannase and ,„ with deiin e d substrates mder ^ — * 

n« bs (H. E. Conrad, BMing Pr0Kim 

dt^nce in th e raes of c,eava g e, with ,-con.aining linkages ^ ^ ^ 

!«"r g rr ,oA) ' 

nep^nase II of tetiasacchandes confining either O (.), ,(o, „ r fc, (t) lmka8es ^ 
does o, c cave Containing glycosidic .images, ^ deaves G-containing Lage 

bnefly.ageas ^ either designated as "short" „ r •■e xhauaive .. sh „„,. 

dieted with 50 nM enzyme for 10 minute, P , ' Wcre 

50(1 ,u , Exhaustive digests were completed usine 

200 nM e^yme for either four hours or overnight. Pariia, nit™ acid cleavage was 
completed using a modification of published procedures Brieflv ,„ 

of saccharide was added a 2x solution of ,„/ aqUe0US S °' Utim 

a s °'" Uon o f sodium mtnte in HCI such that the 

Z b T ,D ^ * "* *"* te *» ^ <*■*■» acid ret" 
d^ aDhydK> — * anhydromannitol. The entire pane. ofHLGAG 
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used as suggested by the manufacturer.] For example, with the hexasaccharide 
AUH nh ,6sGHnsIH N ac> (which contains both I and G in a minimally sulfated region) 
cleavage occurs only at the G under "short" digest conditions as shown in Table 14. 



Table 14 



Species 


m/z (+ Peptide) 


Observed 


AUHNH,6sGHNsIHNac 


5442.1 . 




AUHNsIHNac 


5023.6 




AUH N h,6sGHns 


5061.7 





Heparinase II was incubated with the hexasaccharide AUH NHl 6sGH NS IHNac and 
only cleavage at the G and not the I was observed. Furthermore, we have found that 
degree of sulfation does affect the kinetics of heparinase III degradation of 
oligosaccharides [S. Ernst et al., Crit. Rev. Biochem. Mol Biol. 30, 387 (1995); S. 

10 Yamada et al., Glycobiology 4, 69 (1994); U.R. Desai, H.M. Wang, RJ. Linhardt, 

Biochemistry 32, 8140 (1993); R.J. Linhardt et al., Biochemistry 29, 261 1 (1990).]. In the 
case of heparinase I, this enzyme does not clip either I or G-containing glycosidic 
linkages within the context of our experimental procedures, whereas it readily clips I 2 s 
containing polysaccharides (Figure 10B). Figure 10C shows the same study as 

15 completed in (A) except heparinase I was used instead of heparinase III. With 

heparinase I, cleavage only occurs at I 2 s-containing linkages but not before I or G. There 
is only one report of heparinase I clipping G 2 s containing linkages [S. Yamada, T. 
Murakami, H. Tsuda, K. Yoshida, K. Sugahara, J, Biol Chem. 270, 8696 (1995).], which 
was tested with two tetrasaccharide substrates and the experiments were performed under 

20 conditions which are kinetically very different from the 'short' heparinase I digestion 
presented here. 

Quite a few factors have severely limited and complicated prior art studies and 
inierpretationoF heparinase substrate specificity experiments." First^ hot only is a 
homogenous substrate preparation difficult, but also analyzing the substrates and 
25 products have been very challenging. Analysis has primarily relied on co-migration of 
the saccharides with known standards, and as others and we have observed, 
oligosaccharides with different sulfation patterns do co-migrate, complicating unique 
assignments. Further, some oligosaccharides used in previous studies to assign substrate 
specificity for the heparinases were not homogeneous, complicating analysis. The 
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One of the major strengths of the sequencing strategy of the invention is the flexibility of 
our approach and the integration of MALDI and the coding scheme which enable the 
ability to adapt to different experimental constraints [For example, the recently cloned 
mammalian heparanase is another possible experimental constraint. M.D. Hulett et al., 
5 Nat. Med. 5,793 (1999); L Vlodavsky et. al., Nat. Med. 5, 803 (1999).]. As stated 

additional or different sets of experimental constraints may be used to not only arrive at a 
unique solution but also may be used to validate or confirm the solution from a given set 
of experimental constraints. 

10 Example 8: Methods for identifying protein-polysaccharide interactions and 
improved methods for sequencing. 

To identify HLGAG sequences that bind to a particular protein, the most 
common methodology involves affinity fractionation of oligosaccharides using a 
particular HLGAG subset, namely porcine intestinal mucosa heparin. Enzymatically or 

15 chemically derived heparin oligosaccharides of a particular length are passed over a 
column of immobilized protein. After washing, the bound fraction is eluted using high 
salt to disrupt interactions between the sulfates on the polysaccharide and basic residues 
on the protein; interactions which are crucial for binding. Eluted oligosaccharides are 
then characterized, typically by NMR. In this manner, sequences that bind to a number 

20 of proteins, including antithrombin III (AT-III), basic fibroblast growth factor (FGF-2), 
and endostatin have been identified. 

While rigorous and well tested, this approach suffers from a number of 
limitations. First, column chromatography requires large (milligram) amounts of 
material for successful analysis. Of the entire family of HLGAGs, only heparin is 

25 available in these quantities. However, heparin, due to its high sulfate content, contains 
a limited number of sequences, biasing the selection procedure. Thus, there is no 
opportunity to sample or select for unusual- sequences that-mighUn fact bind wife high 
affinity. In vivo HLGAG-binding proteins sample and bind to the more structurally 
diverse heparan sulfate (HS) chains of proteoglycans at the cell surface where heparin- 

30 like sequences (i.e., sequences with a high degree of sulfation) do not always 
predominate. Heparin, while structurally related to HS, is present in vivo only in mast 
cells. For these reasons, heparin is not always an appropriate analog of cell surface HS, 
and in fact, the exclusive use of heparin in affinity fractionation experiments has created 
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confusion » the Md. One example il lustrates mis poinl . ^ 
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dryness. Saccharides were bound to immobilized proteins by spotting lul of aqueous 
solution on the protein spot for at least five minutes. Unbound saccharides were removed 
by washing with water fifteen times. For selection experiments, the spot was washed ten 
times with various NaCl concentrations, followed by ten water washes. Caffeic acid 
5 matrix in 50% acetonitrile with 2pmol/ul (RG) 19 R was added to the spot prior to MALDI 
analysis. All saccharides were detected as noncovalent complexes with (RG)„R using 
MALDI parameters described herein. 

Saccharide digestion by heparinase I or HI. Saccharides selected for FGF-2 
binding were digested with heparinases I or III by spotting 8ug of enzyme in water after 
10 selection was completed. The spot was kept wet for the desired digestion time by adding 
water as necessary. Caffeic acid matrix with 2pmol/uJ (RG) I9 R was added to the spot for 
MALDI analysis. 

Isolation, Purification, and Selection of FGF binders from SMC heparan 
sulfate. Bovine aortic smooth muscle cells (SMCS) were grown to confiuency. Cells 

15 were washed twice with PBS and then 200 nM heparinase III was added for 1 hr. The 
supernatant was heated to 50°C for 10 minutes to inactivate heparinase III and filtered. 
To remove polynucleotide contamination, the samples were treated with DNAse and 
RNAse at room temperature overnight. Heparan sulfate was isolated by binding to a 
DEAE filter, washing away unbound material, and elution using 10 mM sodium 

20 phosphate 1M NaCl pH 6.0. The material was then concentrated and buffer exchanged 
into water using a 3,000 MWCO membrane. The retentate was lyophilized and 
reconstituted in water. 1 00 nM heparinase II was added and aliquots were taken at 5, 1 0, 
20, and 30 minutes post-addition. 1 uL was spotted on FGF. After drying, the sample 
was washed, 2pmol/ul (RG) ]9 R in matrix was added, and the sample was analyzed as 

25 outlined above. 

Results: 

Saccharide binding to FGF-2 and FGF-]. As a first step towards the 
development of a viable MALDI selection procedure, the FGF system using its 
prototypic members, viz. FGF-1 and FGF-2 was selected. Initial experiments involved 

30 the use of a purified polysaccharide (Hexa 1 of Table 21) that is known to bind with high 
affinity to FGF. With FGF-2, we found that Hexa 1 binds to FGF-2 and were detected, 
even with a salt wash of 0.5M NaCl, consistent with the known affinity of Hexa 1 for 
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FGF-2. In addition, when an equimolar mixture of Hexa 1 and H 

binder) were applied to FPP o A , ***** 2 ( * Iow affinit y 

J ere applied to FGF-2 and washed with 0 2M Nan tn »r • 

binding, only Hexa 1 was observed Aether me , e dm " , " ,B 

the conditions of the • * ^ * faCt und - of 

conditions of the experiment, immobilized FGF-2 retain^ a 

specificity as FGF in solution F,,^ , ^ ^ bindm 8 
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was confirmed by hepannase III cleavage, which results • A 
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octa- and nonasulfated hexasaccharides and sequenced using the methods described 
herein. Thus, the sequence of the nonasulfated hexasaccharide is ±DDD (AU 2 sH N s,6sI 
2sHns,6sI2sHns,6s) and the sequence of the octasulfated hexasaccharide is ±DD-5. 

Saccharide Binding to Antithrombin-III. ATIII is heavily glycosylated, therefore 
5 we anticipated that it would not bind well to the MALDI plate. As an alternative 
strategy, avidin was immobilized on the plate and biotinylated AT-III was bound to the 
avidin. The ATIII biotinylation reaction was carried out in the presence of heparin to 
protect the protein's binding site for HLGAG oligosaccharides. After washing off the 
complexed heparin, penta 1, that contains an intact AT-III pentasaccharide binding 

10 sequence was used to verify that the protein was immobilized on the surface and was 
able to bind saccharides. Penta 1 binding to ATIII was observed up to washes of 0.5M 
NaCl, consistent with it being a strong binder to ATIII. 

Furthermore, this binding is also specific. Introduction of a solution of hexal, 
hexa 2, and penta 1 to immobilized ATIII followed by a 0.2 M salt wash to remove non- 

15 specific binders resulted in signal only for penta 1. Interestingly, there was no signal 
from hexa 2 that contains a partially intact ATIII binding site, suggesting that, under our 
selection conditions, only sequences with a full binding site will be selected for. 

Selection of FGF-2 Binders in SMC HS. Heparan sulfate at the cell surface of 
SMCs is known to contain high affinity sites for FGF binding. In an effort to extend our 

20 initial studies with highly sulfated heparin, we sought to identify high affinity FGF 
binders in heparan sulfate proteoglycans at the cell surface of SMCs. To this end, SMCs 
were treated with either heparinase I or heparinase III and the HLGAGs isolated and 
purified. Consistent with the known substrate specificity of the enzymes, the 
composition of released fragments is different. Fragments were then treated with 

25 heparinase II to reduce them in size. At certain time points, the digest was spotted on 
FGF-2 and selection process was accomplished as outlined above. Consistent with our 
findings ™th heparin,a single hexasaccharide~was identifiedlo l5e~a~higTi affinity "binder 
for FGF-2, namely the nonasulfated hexasaccharide with a sequence +DDD. 

The above-methodology describes an alternative protocol for the selection of 

30 saccharide binders to proteins. This methodology has been applied towards the 
identification of oligosaccharides derived from heparin that bind to two well-established 
systems, FGF and ATIII. As shown, this procedure produces identical results to the 
more established methodology of affinity fractionation. For FGF-1 and FGF-2, high 



WO 00/65521 

PCT/US00/10990 

- 72 - 

affinity binders can * selected out of a poo, of similar saccharide, „ addition A Till 
can be selected fo, U gb affinity binders _ ^ ^ ^ ^ a ^ 

o„T; " t ; derive sm informa,ion &om ta - d *^ 

on a target Second, and more substantially, the a„ aIysis ^ ^ pQp ^ ^ 

advance makes i, feasible to use the more bio.ogically re,eva„, HS isolated from the eel, 
-faceassu ^e r .ban bi^iy suifated neparin f rom m as, ce„s. Pinally, ^ 

Ex-pie 9= M«,hod S ,„ r id.„,«y ing branchillg ,„„ ^ ^ 
branched polysaccharides. 

Increasing evidence exists tha, g ,yc„sy,ation patterns are higb. y influenced by the 

» Slycan stn,«ure, especially in a. degree of brancbing. For instance, in patbo^c 

G.cNAc rescues and increased levels of tri- and tetrantennary struck, By Judicious 

tf^yn^ ag^ As shown 

core structures generated from complex N-glycan structure, 

giycan structures were enzymatically 

rr r d identified maldi " ms ^ w '-*-»- °° «* «. 

denved from enzymatic treatment of a mixture of bi- and tributary 
~ , pmo, of eacb saccharide was subjected to digest with an enzyme codcails 

mass signature of ,462.4 indices «ha, one of the structures is biantenna^ with a core 
fccose while the mass signature of , 665.8 is indicative of a triantennary 

stntcrurcabowithacorefucose. [O] -marutose; f*j =fucose . [|g]=N . ^ 
ac«y,g,ucosami»e ; [^galactose; aru, W=N-acoty,neuraminic acid 
MAL DI-MS seouenchy of the Win Jced poW^HH. „ f , . 

sequencing of the glycan structure of P<!i ft™. : ■ : ' 

structure or PSA from normal prostate tissue was performed 



-73- 



(Figure 12). Figure 12 is data arising from MALDI-MS microsequencing of the PSA 
polysaccharide structure. MALDI-MS was completed using 500 fmol of saccharide. 
Analysis was completed with a saturated aqueous solution of 2,5-dihydroxybenzoic with 
300 mM spermine as an additive. Analytes were detected in the negative mode at an 
5 accelerating voltage of 22 kV. 1 of matrix was added to 0.5 pL of aqueous sample 
and allowed to dry on the target. (A) MS of the intact polysaccharide structure. Peaks 
marked with an asterisk are impurities, and the analyte peak is detected both as M-H 
(m/z 2369.5) and as a monosodiated adduct (M+Na-2H, m/z 2392.6). (B) Treatment of 
[A] with sialidase from A. urefaciens. 10 pmol of saccharide was incubated with enzyme 

10 overnight at 37°C in 10 mM sodium acetate pH 5.5 according to the manufacturer's 

instructions. Two new saccharides were seen, the first, at m/z 2078 corresponding to the 
loss of one sialic acid moiety and the second at m/z 1786.9 corresponding to the loss of 
two sialic acids from the non-reducing end. (C) Digest of [B] with galactosidase from S. 
pneumoniae. Digest procedures were completed essentially as described above. A signal 

15 product at m/z 1462.8 indicated that two galactose residues were removed upon 

treatment of [B] with the enzyme. (D) Digest of [C] with N-acetylhexosaminidase from 
S. pneumoniae. One product was observed as both M-H (m/z 1056.3) and M +Na-2H 
(m/z 1078.1) corresponding to the loss of two N-acetylhexosamine units from [C], A 
Table of the analysis scheme with schematic structure and theoretical molecular masses 

20 is presented in the center of Figure 12. Shown are the parent polysaccharide and 
enzymatically derived products seen in this analysis. [O] = mannose; [^r]= fucose; 
[E3]= N-acetylglucosamine; [□]= galactose; and [A]=N-acetylneuraminic acid. 

Studies of the intact polysaccharide via NMR (large quantities of PSA were 
required for this study) yielded sequence information of the glycan [Belanger, A., van 

25 Halbeek, H., Graves, H.C.B., Grandbois, K., Stamey, T.A., Huang, L., Poppe, L, and 
Labrie, F., Prostate, 1995. 27: p. 187-197]. Similar to other N-linked glycoproteins, as 
stated above, PSA contains a core biantenriary branched motif." Extending* from each 
mannose arm of PSA is a trisaccharide unit. Together these modifications indicated an 
expected molecular mass of 2370 Da for the intact polysaccharide. Using MALDI-MS 

30 and an exoglycosidase array we have sequenced the putative structure for the N-linked 
polysaccharide on PSA (Figure 12). Analysis of the intact polysaccharide yields a 
molecular mass of 2370 Da (Figure 12A), identical to the predicted molecular mass 
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sample was sequentially treated with the exoenzymes (B-D). After overnight incubation 
at 37°C, 1 pmol of the digested PSA was examined by mass spectrometry. Briefly, the 
aqueous sample was mixed with sinapinic acid in 30% acetonitrile, allowed to dry, and 
then examined by MALDI TOF. All spectra were calibrated externally with a mixture of 
5 myoglobin, ovalbumin, and BSA to ensure accurate molecular mass determination. (A) 
PSA before the addition of exoenzymes. The measured mass of 28,478 agreed well with 
the reported value of 28,470. (B) Treatment of (A) with sialidase resulted in a mass 
decrease of 287 Da, consistent with the loss of one sialic acid residue. (C) Treatment of 
(B) with galactosidase. A further decrease of 321 Da indicated the loss of two galactose 

1 0 moieties. (D) Upon digestion of (C) with hexosaminidase, a decrease of 393 Da 
indicated the loss of two N-acetylglucosamine residues. 

The protein had a measured mass of 28,478.3 (Figure 13A). Treatment of the 
intact protein with sialidase resulted in a decrease of 287 Da, consistent with the loss of 
one sialic acid residue (Figure 13B). Additional treatment with galactosidase resulted in 

15 a decrease in mass of 321, consistent with the loss of two galactose residues (Figure 
13C). Finally, treatment with N acetylhexosaminidase resulted in cleavage of two 
GlcNAc moieties (Figure 13D). 

Glycotyping of PSA by EndoFl Treatment EndoF2 is an endoglycanase that 
clips only biantennary structures. Tri- and tetrantennary structures do not serve as 

20 substrates for this enzyme (Figure 14) . In this way, EndoF2 treatment of a glycan 

structure, either attached to the protein or after isolation, was used to identify branching 
identity. This becomes especially important in light of the fact that aberrant changes in 
glycosylation patterns usually result in increased branching. In addition, EndoF2 was 
used to cleave glycan structures that were still attached to the protein of interest. Indeed, 

25 treatment of PSA with EndoF2 resulted in mass shift, consistent with the loss of a 

biantennary, complex type glycan structure. Figure 14 showed the results of treatment of 
biantennary and triantennaiy saccharides with endoglycanse F2. (A) Treatment of the 
biantennary saccharide resulted in a mass decrease of 348.6, indicating cleavage between 
the GlcNAc residues. (B) Treatment of the triantennary saccharide with the same 

30 substituents resulted in no cleavage showing that EndoF2 primarily cleaves biantennary 
structures. (C) EndoF2 treatment of heat denatured PSA. There was a mass reduction of 
1709.7 Da in the molecular mass of PSA (compare 1 1C and 1 1 A) indicating that the 
normal glycan structure of PSA was biantennary. 
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What is claimed is: 
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Claims 



1 . A data structure, tangibly embodied in a computer-readable medium, representing 
a polymer of chemical units, the data structure comprising: 

an identifier including one or more fields, each field for storing a value 
5 corresponding to one or more properties of the polymer, 

wherein at least one field stores a non-character-based value. 

2. The data structure of claim 1 , wherein each of the fields is capable of storing a 
binary value. 

10 

3. The data structure of claim 1 , wherein the identifier is representable as a single- 
digit hexadecimal number. 

4. The data structure of claim 1 , wherein the identifier is representable as a decimal 
15 _ value. 

5 The data structure of claim 4, wherein the decimal value may be reduced to a 
plurality of prime divisors, wherein each prime divisor represents a building block of the 
polymer. 

20 

6. The data structure of claim 1 , wherein the polymer of chemical units comprises a 
polysaccharide and wherein each of the chemical units is a saccharide. 

7. The data structure of claim 1 , wherein the polymer of chemical units comprises a 
25 nucleic acid and wherein each of the chemical units is a nucleotide. 

8. The data structure of claim 1, wherein the polymer of chemical units comprises a 
polypeptide and wherein each of the chemical units is an amino acid. 

30 9. The data structure of claim 1 , wherein the one or more properties comprise one or 
more chemical unit properties, each chemical unit property being a property of one of the 
chemical units of the polymer. 
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10.. The data structure of claim 9 wherein th* ~ 

unit of the polymer: " * ^ ,dentit y of a <*emical 

12. The data structure of claim 9 wherein a- 

polymer. conrirmat,on of a chemical unit of the 

13. The data structure of claim 9 wherein tn» 

-5 * chemical, of ae,^. identity being an identity of a substituent of 

14. The data structure of claim I wherein a 

™» P^es of tie po lymer . ' heran * ~ « ™ re ~ one or 

15. The data structure of claim 1 4 wherein a- 

— 'harge of the po.ymer, °" " « 

1 6. The data structure of claim 1 4 wherein a 

« number of anfates 0 f ^ polyi T ** " ^ • 

17. The data structure of claim 14 wherein th. « 

dye-binding of the polymer. ' COn * rise • 

1 8- The data structure of claim 1 4 wherein th. « ' 

-oreprop.rHesofapo.ysaccnarid; Wherem,he0neOTm0re »—P-- 
19 The data structure of claim 1 8 wherein th« « 
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20. The data structure of claim 1 8 ? wherein the one or more properties of a 
polysaccharide include one or more compositional ratios of iduronic versus glucuronic. 

5 21. The data structure of claim 1 8, wherein the one or more properties of a 
polysaccharide include enzymatic sensitivity. 

22. The data structure of claim 14, wherein the one or more properties comprise a 
mass of the polymer. 

10 

23. The data structure of claim 14, wherein the one or more properties comprise 
degree of sulfation. 

24. The data structure of claim 1 4, wherein the one or more properties comprise 
15 charge. 

25. The data structure of claim 1 4, wherein the one or more properties comprise 
chirality. 

20 26. The data structure of claim 1 , wherein the identifier comprises a numerical 
identifier. 

27. A computer-implemented method for generating a data structure, tangibly 
embodied in a computer-readable medium, representing a polymer of chemical units, the 
25 method comprising an act of: 

generating an identifier including one or more fields for storing values, 
each value corresponding" to one or more propel ~ 
wherein at least one field stores a non-character-based value. 

30 28. A computer-implemented method for determining whether properties of a query 
sequence of chemical units match properties of a polymer of chemical units, the query 
sequence being represented by a first data structure, tangibly embodied in a computer- 
readable medium, including an identifier that includes one or more fields, each field 
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-ring a value corresponding l0 one or more 

polymer being reputed by . SECOnd data " J " 

comprising acts of: F^ymer, tiie method 

(A) generating at least one mask based on the values stored in th, « 
or more fields of the first data structure; ^ 

(B) performing at least one binary operation on the values stored in the 

generate at least one result; and 

(O ^tenniningwhemermeoneormorepropeniesofthequery 
sequencematchtheoneormorepropertiesofthepolymerbased, 



oneresuIt " — l -P°^r based on the at least 
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The method of claim 28, wherein each of the one or more fields of the f t „ 
second data structures is a bit field. firSt and 



30. 



20 
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25 



The method of data 28, wherein to act (A) comprises ,» act tf 

(A)(1) generating «he a, leas, one mask as a sequence of bits tha, is 
enu.vaien.,0 the va,„es s«ored in the fie,ds „f me first ^ „ 

Tne method of Cairn 28, wherein the ac, (A) comprises «, „, of . 

(A)(D generating «he at least one mask as a sequentia! repetition of the 
values stored in the fields ofthe firs, data stntcture. 
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The method of claim 28, wherein the »t i.... 
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^^^^^^^^^^ 
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logical OR operation to generate the at least one result. 
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33. The method of claim 28, wherein the act (C) comprises an act of: 

(C)(1) determining that the one or more properties of the query sequence 
match the one or more properties of the polymer when the at least one result has a 
non-zero value. 

34. The method of claim 28, wherein the at least one binary operation comprises at 
least one logical AND operation. 

35. A database, tangibly embodied in a computer-readable medium, for storing 
information descriptive of one or more polymers, the database comprising: 

one or more data units corresponding to the one or more polymers, each 
of the data units including an identifier that includes one or more fields, each 
field for storing a value corresponding to one or more properties of the polymer. 

36. A method for determining whether complete building blocks of a query sequence 
of chemical units match complete building blocks of a polysaccharide, the query 
sequence being represented by a first data structure, tangibly embodied in a computer- 
readable medium, including an identifier that includes one or more fields, each field for 
storing a value corresponding to a complete building block of the query sequence, the 
polysaccharide being represented by a second data structure, tangibly embodied in a 
computer-readable medium, including an identifier that includes one or more fields, each 
field for storing a value corresponding to a complete building block of the 
polysaccharide, the method comprising acts of: 

(A) generating at least one mask based on the values stored in the one 
or more fields of the first data structure; 

(B) performing at least one binary operation on the values stored in the 
one or more fields of the second data structure using the at least one mask to 
generate at least one result; and 

(C) determining whether the complete building blocks of the query 
sequence match the complete building blocks of the polysaccharide based on the 
at least one result. 
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37. The method of claim 36, wherein each of th- 
second data structures is a bil ^ 

38. A data structure, tangibjy embodied in a comnuteMe-fakl. J - 

» « Polysaccharide, the data structure comprising: ""^ 

an identifier including one or more fc , ds , ^ fidd for 
correspond ,„ a cornpktc bujldi „ g ^ ^ 

41. The data structure of claim 38, wherein the irW;r • 

value. neremtheidentlfier 13 presentable as a decimal 

42 The data structure of claim 41 whe^n th„ ^ • , , 



43. A data structure, tangibly embodied in a computer wafcAU 
a chermca, utti, „ f a po, ym er, the da, struck corrlprilT ^ ^ ~ 8 
an.dentifier inciuding one or more fields, each fie.d for storing a vaiue 
correspondtugtoone or more properties of the chemicaj unit 

wherein a, teas, one field stores a n„„-charac te r-based value. 
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46. The data structure of claim 43, wherein the one or more properties include a 
confirmation of the chemical unit. 

47. The data structure of claim 43, wherein the one or more properties include an 
identity of a substituent of the chemical unit. 

48 The data structure of claim 43, wherein each of the fields is capable of storing a 
binary value. 

49. The data structure of claim 43, wherein the identifier is representable as a single- 
digit hexadecimal number. 

50. The data structure of claim 43, wherein the identifier is representable as a decimal 
value. 

5 1 The data structure of claim 50, wherein the decimal value is a primary number. 

52. The data structure of claim 5 1 , wherein the polymer is a polysaccharide, and the 
primary number identifies the chemical unit as a building block of the polysaccharide. 

53. The data structure of claim 43, wherein the polymer is a polysaccharide. 

54. In a system including a database of values of properties of polymers of chemical 
units, a method for determining the composition of a sample polymer of chemical units 
having a known molecular length, comprising steps of: 

(A) selecting, from the database, candidate polymers of chemical units having 
the same length-as the sample -polymer -of chemical -units and for which 

the value of a predetermined property is similar to the value of the 
predetermined property of the sample polymer of chemical units; 

(B) performing an experiment on the sample polymer of chemical units; 

(C) measuring properties of the sample polymer of chemical units resulting 
from the experiment; and 
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58- The method of claim 57, wherein the step of determ,V 
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62. The method of claim 59, wherein the sample polymer is isolated from a cell 
surface. 



63. A method for identifying a subpopulation of polymers having a property in 
5 common with a sample polymer of chemical units, comprising: 

(A) applying an experimental constraint to the polymer to modify the 
polymer, 

(B) detecting a property of the modified polymer; 

(C) identifying a population of polymers of chemical units having the same 
10 molecular length as the sample polymer; and 

(D) identifying a subpopulation of the identified population of polymers 
having the same property as the modified polymer by eliminating, from 
the identified population of polymers, polymers having properties that do 
not correspond to the modified polymer. 

15 64. The method of claim 63, further comprising repeating steps (A), (B), and (D) on 
the modified polymer to identify a second subpopulation within the subpopulation of 
polymers having a second property in common with the twice modified polymer. 

65. The method of claim 64, farther comprising repeatedly performing the steps (A), 
(B), and (D) on the modified polymer until the number of polymers within the 

20 subpopulation falls below a predetermined threshold. 

66. The method of claim 65, wherein the predetermined threshold of polymers within 
the subpopulation is two polymers and wherein the method is performed to identify the 
sequence of the polymer. 

67. The method of claim 65, wherein the experimental constraints applied to the 
25 polymer are different for each repetition. . 

68. The method of claim 63, wherein the experimental constraint applied to the 
polymer is digestion with an exoenzyme. 

69. The method of claim 63, wherein the experimental constraint applied to the 
polymer is digestion with an endoenzyme. 
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70 The method of claim 63, wherein the experimental constraint applied to the 
polymer is selected from the group consisting of restriction endonuclease digestion- 
chemical digestion; chemical modification; interaction with a binding compound- ' 
chemical peeling; and enzymatic modification. 

7. The method of claim 63, wherein the proper* of the poller i s ^ecuto 
weight. 

72 The method of claim 63, wherein depopulation of powers of chemicals 
mcludes every polymer sequence having *, mdecuiar weigh, of ,he sample polymer. 

73 ^™=ttodofc.aun63,where tam epopu M o„„fpo, ym er 3 ofchern i calun iB 
mctades less than every polymer sentence having rhe m o,ec„,ar weigh, of me sample 

74. The method of claim 63, wherein me s,ep of deletion tovo.ves the use of mass 
spectrometry to determine the molecular weight of the polymer. 

75. The method of claim 74, wherein the mass spectrometiy is mate assisted .aser 

16. The method of claim 63, wherein polymer is reduced to a, leas, two fragments 
and me property of me polymer is me size of the fragments a*d wherein me step of 
detection involves strong ion exchange chromatography. 

77. The method of claim 63, wherein me step of identifying includes selecting the 
WU.at.on of polymers of chemical unit, from a database inc.uding mo.ecu.ar weights of 
polymers of chemical units. 

78. The memod of Cairn 77, wherein the database includes identifiers corresponding 
* cr.em.ca. umts of a pmraltty 0 f polymcrs> ^ of me ^ ^ 

stormg a va.„e corre^ndmg ,o a prop^ of me corre.ponding chemical uni,. 

79. A memod for compositional analysis of chemica. units of a sample polymer 
comprising: 3 ' 
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applying an experimental constraint to the sample polymer to modify the 
sample polymer, 

detecting a property of the modified sample polymer; 
comparing the modified sample polymer to a reference database of 
polymers of identical size as the polymer, wherein the polymers of the 
reference database have also been subjected to the same experimental 
constraint as the sample polymer, wherein the comparison provides a 
compositional analysis of the sample polymer. 

80. The method of claim 79, wherein the step of detection involves capillary 
10 electrophoresis. 

8 1 . The method of claim 79, wherein the experimental constraint applied to the 
polymer involves complete degradation of the polymer into individual chemical units, 
and wherein the compositional analysis reveals the number and type of units within the 
polymer. 

15 82. The method of claim 79, wherein the step of detection involves matrix assisted 
laser desorption ionization mass spectrometry. 

83. The method of claim 82, wherein the experimental constraint applied to the 
polymer involves incomplete enzymatic digestion of the polymer and wherein steps (A), 
(B), and(C) are repeated until the number of polymers within the reference database falls 

20 below a predetermined threshold, and wherein the compositional analysis reveals the 
identity of a sequence of chemical units of the polymer. 

84. The method of claim 77, wherein the reference database includes identifiers 
corresponding to chemical units of a plurality of polymers, each of the identifiers 
including a field storing a value corresponding to. a property of the corresponding 

25 chemical unit. 

85. A method for sequencing a polymer, comprising: 

(A) applying an experimental constraint to the polymer to modify the 
polymer, 

(B) detecting a property of the modified polymer; 



(A) 

(B) 
(C) 
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(E) 



identifying a population of P o, ymers having the same molecuiar length as 
the sample pol yme r and having molecular weights similar to the 
molecular weight of the sample polymer; 

identifying a subpopulation of the identified p 0pulation of poIymers 
havmg the same property as the modified polymer by eliminating, from 
the .denufied population of polymers, polymers having properties that do 

not correspond to the modified polymer; 
repeating steps ( A) , (B) , ^ (D) by ^ 

constats to ,he polymer and idling additional KbfvMlm of 
powers until the number of polymers ^ ^ . 
and the sequence of the polymer may be identified. 

* A method fa identifying a polysacclwide-protein interaction, comprising. 

contacting a protein-coated MALDI surface with a polysaccharide confining 
sarnpktoproduceapolysaccharide-protein-coatedMALD,^, 8 

^D,3i:r mfc ^ 

perform! ^ MALM mass spectiometiy to identify me po.ysacchande te 
specficaHy mteracts with me protein coated on the MALDI surface. 

87 Tie method of ctaim 86, wherein a MALDI matrix is added to me 
polysaccharide-protein-coated MALDI surface. 

The method of claim 86, further comprising app,yin g M experimeMa , 
^epolysacchar.dec.undonmepolysaccharide-protem^ 
performmg the MALDI mass spectrometry analysis. sur "ice before 

« ^ memod of data 88, wherein the experimental constrain, applied to the 
polymer is dtgestion with an exoenzyme. 

Ml The method of claim 88, wherein the experiment constrain, applied te me 
polymer K dtgestion with an endoenzyme. 
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91 . The method of claim 88, wherein the experimental constraint applied to the 
polymer is selected from the group consisting of restriction endonuclease digestion; 
chemical digestion; chemical modification; and enzymatic modification. 
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